In [ ]:
%%shell
jupyter nbconvert --to html /content/nyc_property_price_deep_learning_demo.ipynb
[NbConvertApp] Converting notebook /content/nyc_property_price_deep_learning_demo.ipynb to html
[NbConvertApp] Writing 1173252 bytes to /content/nyc_property_price_deep_learning_demo.html
Out[ ]:

In [46]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/two-sigma-case-study
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/two-sigma-case-study
In [52]:
%%capture
!pip install plotly==5.3.1
# !pip install optuna
!pip install "notebook>=5.3" "ipywidgets>=7.5"
!pip install optuna==1.0.0
In [53]:
# !pip3 install "notebook>=5.3" "ipywidgets>=7.5"
# general packages
import joblib
import logging
import datetime
import sys
import os
import warnings

# data
import pandas as pd
import numpy as np
import scipy
import seaborn as sns
import missingno as msno
import random
from itertools import zip_longest
import matplotlib.pyplot as plt
from scipy import stats
from scipy.stats import norm, skew
import re
import statsmodels.api as sm


# ML and DL
import optuna
from optuna.trial import TrialState
from sklearn import preprocessing, impute
from sklearn.experimental import enable_iterative_imputer
from sklearn.model_selection import StratifiedKFold, KFold, cross_val_score
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import ElasticNet, Lasso, BayesianRidge, LassoLarsIC, LinearRegression
from sklearn.ensemble import RandomForestRegressor,  GradientBoostingRegressor
from sklearn.kernel_ridge import KernelRidge
from sklearn.preprocessing import RobustScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import AdaBoostRegressor,ExtraTreesRegressor
import torch 
from torch import nn, optim
import torch.nn.functional as F
from xgboost import XGBRegressor
import plotly

# options
plotly.offline.init_notebook_mode()
pd.options.display.max_rows = 4000
%config Completer.use_jedi = False
plt.style.use('seaborn')
color = sns.color_palette()
sns.set_style('darkgrid')
warnings.filterwarnings('ignore')

nyc-property-price-deep-learning-demo


take home interview for data scientist role. this repository is primarily to demonstrate deep learning modeling skillset, e.g. neural network architecture search (NAS) for nyc property price prediction (supervised) and denoising autoencoder nas (latent feature engineering). custom data cleaning modules, exploratory data analysis, and multiple linear regression parameter hypothesis testing also included.



Research Challenge:

NYC property price prediction


In this notebook I perform several analyses on Manhattan property sales data from 2020-2021.

My analyses can be summarised as the following

  • Build understanding through data cleaning and exploratory data analysis.
  • Conduct hypothesis tests to estimate relationships between key variables
  • Fit a wide range of machine learning and deep learning models (NAS for supervised nn and denoising autoencoder) to predict Manhattan property sales data.

Motivation: Build supervised learning model to predict property sales prices to uncover undervalued and overvalued properties.

Loss metric:: Across predictive modeling tasks we optimize for RMSE. We choose RMSE over MAE because we am assuming we are disproportionally sensitive to bad model predictions.

Hypothesis: While I define formal null and alternative hypotheses (with OLS regression assumptions about the features and target) for the MLR hypothesis testing, my hypothesis for predictive modeling is the following:

Increasing our feature set size, engineering features with denoising autoencoders, and increasing model complexity (NAS) will allow use to improve upon baseline model performance (baseline model is OLS with restricted feature set). While this might be trivial, it will be interesting to see which techniques increase model performance and to what extent.



Table of Contents:

1. Data Preparation and exploratory data analysis

a. Missing data and outlier removal
b. Encoding and scaling
c. Exploratory data analysis

2. Hypothesis Testing with Multiple Linear Regression

a. Null Hypothesis testing for f-test and t-test
b. Hypothesis testing and regression output

3. Predictive modeling

a. establish baseline model
b. comparision of supervised learning methods on small feature set
c. clean features for medium feature set
d. comparision of supervised machine learning models on medium feature set
e. neural network architecture search (optimize network topology, e.g. layers, nodes, dropout, etc.)
e. denoising autoencoder neural architecture search
f. denoising autoencoder latent feature extraction
g. neural network prediction on denoised feature set

4. Takeaways and next steps

a. Data cleaning
b. EDA
c. Hypothesis testing
d. Predictive modeling
e. Time spent



In [ ]:
sales_df_raw = pd.read_csv('rollingsales_manhattan.csv')
sales_df = sales_df_raw.copy(deep=True)
print(sales_df.shape)
(16539, 21)
In [ ]:
sales_df.describe().T
Out[ ]:
count mean std min 25% 50% 75% max
BOROUGH 16539.0 1.000000 0.000000 1.0 1.0 1.0 1.0 1.0
BLOCK 16539.0 1098.559828 523.789458 8.0 722.5 1158.0 1448.0 2250.0
LOT 16539.0 762.894250 905.833268 1.0 29.0 1003.0 1206.0 9108.0
EASEMENT 0.0 NaN NaN NaN NaN NaN NaN NaN
ZIP CODE 16539.0 10029.982889 36.009304 10001.0 10013.0 10022.0 10028.0 10463.0
RESIDENTIAL UNITS 8729.0 2.824149 12.351636 0.0 1.0 1.0 1.0 490.0
COMMERCIAL UNITS 1854.0 2.313376 11.998357 0.0 0.0 1.0 1.0 259.0
TOTAL UNITS 9233.0 3.134517 13.345840 0.0 1.0 1.0 1.0 492.0
YEAR BUILT 14677.0 1953.673980 38.134771 1800.0 1920.0 1956.0 1986.0 2021.0
TAX CLASS AT TIME OF SALE 16539.0 2.077997 0.465095 1.0 2.0 2.0 2.0 4.0

1. Data Preparation


In this section we clean the data enough to perform basic EDA, i.e. drop missingness, change data types, etc. Most of the decisions implemented here are a result of much more investigation not shown here. For the readability of this notebook, I've removed most of the output that led to these decisions.


a. Missing data and outlier removal


Below we see that we have a number of features with ~50% or more missing values. Let's drop missing data from these features liberally for the time being.


Other EDA that is not included here reassures me that dropping this data is reasonable for now.


In [ ]:
# missing data by feature
sales_df.isna().sum()
Out[ ]:
BOROUGH                               0
NEIGHBORHOOD                          0
BUILDING CLASS CATEGORY               0
TAX CLASS AT PRESENT                 19
BLOCK                                 0
LOT                                   0
EASEMENT                          16539
BUILDING CLASS AT PRESENT            19
ADDRESS                               0
APARTMENT NUMBER                   8651
ZIP CODE                              0
RESIDENTIAL UNITS                  7810
COMMERCIAL UNITS                  14685
TOTAL UNITS                        7306
LAND SQUARE FEET                  15189
GROSS SQUARE FEET                 15189
YEAR BUILT                         1862
TAX CLASS AT TIME OF SALE             0
BUILDING CLASS AT TIME OF SALE        0
SALE PRICE                            0
SALE DATE                             0
dtype: int64


Feel very comfortable dropping variables with about >50% missing values (may revisit later),
will create new category for missing and dummy out remaining missing values in nominal variables

In [ ]:
sales_df.drop(['EASEMENT', 'BOROUGH','APARTMENT NUMBER', 'LAND SQUARE FEET', 'GROSS SQUARE FEET', 'COMMERCIAL UNITS', 'TOTAL UNITS', 'RESIDENTIAL UNITS','BLOCK', 'LOT','ADDRESS'], axis = 1, inplace = True) 

Drop outliers; encode, impute, and scale features; and other data prep


For the time being, we are going to restrict our df to include what what seem like our most important and interpretable features, while avoiding high cardinality nominal features (will need to dummy out can high dimensionality will compromise our unpenalized regression models). This way we can spend less time cleaning and begin hypothesis testing using multiple linear regression more quickly.

Later on, when we run predictive models, we clean other features and assess their importance with respect to model performance.

In [ ]:
# print cardinality
for i in sales_df.columns:
    print(i, sales_df[str(i)].value_counts().shape[0])
NEIGHBORHOOD 39
BUILDING CLASS CATEGORY 39
TAX CLASS AT PRESENT 8
BUILDING CLASS AT PRESENT 106
ZIP CODE 46
YEAR BUILT 143
TAX CLASS AT TIME OF SALE 3
BUILDING CLASS AT TIME OF SALE 107
SALE PRICE 3390
SALE DATE 326
In [ ]:
# features to include in MLR model
raw_feats_small = ['SALE PRICE', 'NEIGHBORHOOD', 'YEAR BUILT', 'SALE DATE', 
                   'TAX CLASS AT TIME OF SALE','BUILDING CLASS CATEGORY']
small_df = sales_df.copy()
In [ ]:
# clean date format to account for cyclical natural of months in data
def encode(data, col, max_val):
    data[col + '_sin'] = np.sin(2 * np.pi * data[col]/max_val)
    data[col + '_cos'] = np.cos(2 * np.pi * data[col]/max_val)
    return data

small_df['SALE DATE'] = pd.to_datetime(small_df['SALE DATE'], infer_datetime_format=True)

small_df['SALE MONTH'] = small_df['SALE DATE'].dt.month
small_df = encode(small_df, 'SALE MONTH', 12)

small_df['SALE YEAR'] = small_df['SALE DATE'].dt.year.apply(lambda x: str(x))


Define custom classes I wrote for imputing, encoding, and scaling features


In [ ]:
class CategoricalFeatures:
    def __init__(self, df, nominal_features, ordinal_features=[], encoding_type='ohe', handle_na=False):
        """
        df: pandas dataframe
        nominal_features: list of nominal column names
        ordinal_features: list of ordinal column names
        encoding_type: label or ohe
        handle_na: True/False
        """
        self.df = df
        self.nom_feats = nominal_features
        self.ord_feats = ordinal_features
        self.enc_type = encoding_type
        self.handle_na = handle_na
        self.output_df = self.df.copy(deep=True)
        self.missing_indicators = None

        if self.handle_na:
            missing_col_names = self.output_df.columns[self.output_df.isnull().any()].to_list()
            self.missing_indicators = self.output_df[missing_col_names].isnull().astype(int).add_suffix('_na_flag')
            
            for c in self.ord_feats:
                num = self.output_df[c].mode()[0]
                self.output_df[c].fillna(num, inplace = True)
            
            for c in self.nom_feats:
                self.output_df.loc[:, c] = self.output_df.loc[:, c].astype(str).fillna("miss")
    
    def _label_encoding(self):
        for c in self.ord_feats:
            lbl = preprocessing.LabelEncoder()
            lbl.fit(self.output_df[c].astype(str).values)
            self.output_df.loc[:, c] = lbl.transform(self.output_df[c].astype(str).values)
            
        if self.handle_na:
            self.output_df = pd.concat([self.output_df, self.missing_indicators], 1)

        return self.output_df

    def _one_hot(self):
        return pd.get_dummies(self.output_df, columns=self.nom_feats)

    def _fit_transform(self):
        if self.enc_type == "label":
            return self._label_encoding()
        elif self.enc_type == "ohe":
            return self._one_hot()
        elif self.enc_type == "mixed":
            self._label_encoding()
            return self._one_hot()
        else:
            raise Exception("Encoding type not understood")              
In [ ]:
class NumericFeatures:
    def __init__(self, df, numeric_features, imputation_type, add_na_flags, scale_type):
        """
        df: pandas dataframe
        numeric_features: list of numeric column names
        imputation_type: mean, median, MICE?, None
        add_nan_flags: add missing indicators for imputed rows
        scale_type: quantile_transform, z_score, None
        """
        self.df = df
        self.num_feats = numeric_features
        self.imp_type = imputation_type
        self.na_flag = add_na_flags
        self.scale_type = scale_type
        self.output_df = self.df.copy(deep=True)

    def _impute(self):
        # create missing indicators
        if self.na_flag:
            missing_col_names = self.output_df.columns[self.output_df.isnull().any()].to_list()
            missing_indicators = self.output_df[missing_col_names].isnull().astype(int).add_suffix('_na_flag')
        
        # assign numeric features
        num_vars = self.output_df[self.num_feats].values

        # initialize imputer
        if self.imp_type == "mean":
            imputer = impute.SimpleImputer(strategy='mean')
        elif self.imp_type == "median":
            imputer = impute.SimpleImputer(strategy='median')
        elif self.imp_type == "mice":
            imputer = IterativeImputer()         
        elif self.imp_type == "knn":
            imputer = impute.KNNImputer()
        else:
            raise Exception("Impute type not understood")

        # impute and assign filled feats
        imputed_num_vars = imputer.fit_transform(num_vars)
        self.output_df[self.num_feats] = imputed_num_vars

        # append missing indicators
        if self.na_flag:
            self.output_df = pd.concat([self.output_df, missing_indicators], 1)

        return self.output_df

    def _scale(self):
        # define num var values
        num_vars = self.output_df[self.num_feats].values

        # select scaler
        if self.scale_type == "standardize":     
            scaler = preprocessing.StandardScaler()
        elif self.scale_type == "robust_standardize":     
            scaler = preprocessing.RobustScaler()
        elif self.scale_type == "min_max":     
            scaler = preprocessing.MinMaxScaler()
        elif self.scale_type == "quantile_transform":
            scaler = preprocessing.QuantileTransformer(output_distribution='normal')
        else:
            raise Exception("Scale type not understood")              

        # scale num vars
        scaled_num_vars = scaler.fit_transform(num_vars)
        self.output_df[self.num_feats] = scaled_num_vars

        return self.output_df

    def _fit_transform(self):
        if self.imp_type is not None and self.scale_type is not None:
            self._impute()
            return self._scale()
        elif self.imp_type is not None:
            return self._impute()
        elif self.scale_type is not None:
            return self._scale()
        else:
            raise Exception("Encoding type not understood")




b. Encoding and scaling:


Label encoding ordinal variables, one hot encode nominal varibles, and add missing indicators using custom classes I wrote for data prep

In [ ]:
for c in ['NEIGHBORHOOD', 'TAX CLASS AT TIME OF SALE', 'BUILDING CLASS CATEGORY']:
    small_df.loc[small_df[c].value_counts()[small_df[c]].values < 50, c] = "RARE_VALUE"
In [ ]:
nom_feats_small = ['NEIGHBORHOOD', 'TAX CLASS AT TIME OF SALE', 'BUILDING CLASS CATEGORY']
ord_feats_small = ['YEAR BUILT', 'SALE YEAR']

cat_feats = CategoricalFeatures(small_df, 
                                nominal_features = nom_feats_small,
                                ordinal_features = ord_feats_small, 
                                encoding_type="mixed",
                                handle_na=True)
df_cat_transformed = cat_feats._fit_transform()

No imputation or scaling necessary right now for our numeric features

In [ ]:
num_feats_small = ['SALE MONTH_sin', 'SALE MONTH_cos']

For high cardinality nominal variables, recode low incidence levels

In [ ]:
for c in ['NEIGHBORHOOD', 'TAX CLASS AT TIME OF SALE', 'BUILDING CLASS CATEGORY']:
    small_df.loc[small_df[c].value_counts()[small_df[c]].values < 50, c] = "RARE_VALUE"

c. Exploratory data analysis:


Check target distribution and clean

In [ ]:
def check_skewness(df, col):
    """ Plot distribution and descriptive stats"""
    sns.distplot(df[col] , fit=norm);
    fig = plt.figure()
    res = stats.probplot(df[col], plot=plt)
    
    # Get the fitted parameters used by the function
    (mean, stddev) = norm.fit(df[col])
    print( '\n mean = {:.2f} and stddev = {:.2f}\n'.format(mean, stddev))
In [ ]:
# clean our target var
df_cat_transformed['SALE PRICE'] = df_cat_transformed['SALE PRICE'].apply(lambda x: int(re.sub(',', '',x)))
check_skewness(df_cat_transformed, 'SALE PRICE')
 mean = 2572272.60 and stddev = 13449275.03

Our target is highly right skewed. We will remove some outliers but leave right skewed for now, but may apply log transform to make our target more normal before predictive modeling.

In [ ]:
# remove outliers and 0s, could do more here (including scaling), but will leave for now
df_cat_transformed = df_cat_transformed[df_cat_transformed['SALE PRICE'].between(10000, df_cat_transformed['SALE PRICE'].quantile(.95))]
check_skewness(df_cat_transformed, 'SALE PRICE')
 mean = 1597464.34 and stddev = 1375615.40

Plot median daily sales price

Eventhough our target is highly right skewed, our median daily sales prices series looks approximately stationary, i.e. has constant mean and variance. This suggests that sale date has spurious relationship with sales price.

In [ ]:
median_price_series = df_cat_transformed.groupby('SALE DATE')['SALE PRICE'].agg(['median'])
df_cat_transformed.drop(['SALE DATE', 'SALE MONTH'], axis=1, inplace=True)
plt.plot(median_price_series)  
Out[ ]:
[<matplotlib.lines.Line2D at 0x7fa3fc7e5a10>]

Look at features with highest pairwise correlations, no noteworthy relationships here. Also, notice we added our target for good measure, despite not being highly correlated with any features.

We could run more meaningful independence/collinearity tests for ordinal and dummy data here (e.g. chi-squared test, mutual information test, etc.)

In [ ]:
corrmat = df_cat_transformed.corr()
top_corr_features = set()
for c in list(corrmat.index):
    top_corr_features.update(set(corrmat.index[abs(corrmat[c]).between(0.5,.99)])) 
        
top_corr_features.update({'SALE PRICE'})
plt.figure(figsize=(10,10))
g = sns.heatmap(df_cat_transformed[sorted(top_corr_features)].corr(),annot=True,cmap="RdYlGn")

We can see that we don't have strong correlation or linear relationships from the scatterplots below. Another good reason to look at more meaningful independence/collinearity tests for ordinal and dummy data (e.g. chi-squared test, mutual information test, etc.)

Please excuse jupyter formatting!

In [ ]:
sns.set()
cols = list(top_corr_features)[:5]
sns.pairplot(df_cat_transformed[cols])
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x7fa3fc6a5dd0>

2. Hypothesis Testing with Multiple Linear Regression¶

a. Null Hypothesis for t-test and f-test


In this section I run several hypothesis tests using MLR:

$\text{1. Full model F-Test:}$

${H}_{0} : {\ \beta}_{1} = {\beta}_{2} = {\beta}_{3} = ... = {\beta}_{m} = 0$
${H}_{A} :$ At least one ${\ \beta}_{i}{\ \not=\ } 0 \text{ , for }\textit{i}\text{ in 1, 2, 3,..., m} $

Research question 1: Is a regression model containing at least one predictor useful in predicting the size of the sales price? In other words, is our full model useful at predicting sales price.
We will include predictors encoded from the following features : ['SALE PRICE', 'NEIGHBORHOOD', 'YEAR BUILT', 'SALE DATE', 'TAX CLASS AT TIME OF SALE','BUILDING CLASS CATEGORY']


$\text{2. Full model T-Test:}$

${H}_{0} : {\ \beta}_{taxclass_1} = 0$
${H}_{A} : {\ \beta}_{taxclass_1}{\ \not=\ } 0$

Research question 2: While controlling for all other features in our model, is the predictor 'TAX CLASS AT TIME OF SALE_1' in the full regression model significantly linearly related to the sales price? (Tax class 1: Includes most residential property of up to three units, vacant land that is zoned for residential use, and most condominiums that are not more than three stories.)

In other words, The t-test is a test for the marginal significance of the x1 predictor after controlling for all the rest of the full model predictor set.



Hypothesis testing and regression output:


$\text{Full model F-Test:}$

In [ ]:
df_cat_transformed = df_cat_transformed.sample(frac=1, random_state=42)
df_cat_trans_small = df_cat_transformed[:].copy()
In [ ]:
# select columns described above in full model F-test
df_cat_transformed.drop(['TAX CLASS AT PRESENT', 'ZIP CODE',
                         'BUILDING CLASS AT TIME OF SALE','TAX CLASS AT PRESENT_na_flag', 
                         'BUILDING CLASS AT PRESENT_na_flag', 'BUILDING CLASS AT PRESENT', 
                         'NEIGHBORHOOD_ALPHABET CITY'
                         ],axis=1,inplace=True)
In [ ]:
X = df_cat_transformed.drop('SALE PRICE', axis=1)
X = sm.add_constant(X) # add intercept term
y = df_cat_transformed['SALE PRICE']

results = sm.OLS(y, X).fit()
results.summary()
Out[ ]:
OLS Regression Results
Dep. Variable: SALE PRICE R-squared: 0.292
Model: OLS Adj. R-squared: 0.288
Method: Least Squares F-statistic: 77.85
Date: Fri, 17 Sep 2021 Prob (F-statistic): 0.00
Time: 15:36:59 Log-Likelihood: -1.8975e+05
No. Observations: 12337 AIC: 3.796e+05
Df Residuals: 12271 BIC: 3.801e+05
Df Model: 65
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 1.468e+06 1.09e+05 13.510 0.000 1.26e+06 1.68e+06
YEAR BUILT 1075.1743 408.783 2.630 0.009 273.895 1876.454
SALE MONTH_sin -2.406e+04 2.76e+04 -0.872 0.383 -7.82e+04 3e+04
SALE MONTH_cos -2.292e+04 2.22e+04 -1.034 0.301 -6.64e+04 2.05e+04
SALE YEAR 4.064e+04 4.84e+04 0.839 0.401 -5.43e+04 1.36e+05
YEAR BUILT_na_flag -1.098e+05 4.56e+04 -2.410 0.016 -1.99e+05 -2.05e+04
NEIGHBORHOOD_CHELSEA 8.39e+05 1.26e+05 6.679 0.000 5.93e+05 1.09e+06
NEIGHBORHOOD_CHINATOWN 3.42e+05 2.1e+05 1.632 0.103 -6.87e+04 7.53e+05
NEIGHBORHOOD_CIVIC CENTER 1.601e+06 1.45e+05 11.010 0.000 1.32e+06 1.89e+06
NEIGHBORHOOD_CLINTON -2.793e+04 1.42e+05 -0.196 0.844 -3.07e+05 2.51e+05
NEIGHBORHOOD_EAST VILLAGE 4.882e+05 1.54e+05 3.168 0.002 1.86e+05 7.9e+05
NEIGHBORHOOD_FASHION 3.739e+05 1.96e+05 1.907 0.057 -1.05e+04 7.58e+05
NEIGHBORHOOD_FINANCIAL -4.054e+04 1.36e+05 -0.298 0.766 -3.07e+05 2.26e+05
NEIGHBORHOOD_FLATIRON 1.083e+06 1.43e+05 7.590 0.000 8.03e+05 1.36e+06
NEIGHBORHOOD_GRAMERCY 4.789e+05 1.31e+05 3.663 0.000 2.23e+05 7.35e+05
NEIGHBORHOOD_GREENWICH VILLAGE-CENTRAL 1.016e+06 1.27e+05 7.981 0.000 7.67e+05 1.27e+06
NEIGHBORHOOD_GREENWICH VILLAGE-WEST 1.091e+06 1.29e+05 8.492 0.000 8.4e+05 1.34e+06
NEIGHBORHOOD_HARLEM-CENTRAL -5.215e+05 1.3e+05 -4.017 0.000 -7.76e+05 -2.67e+05
NEIGHBORHOOD_HARLEM-EAST -6.178e+05 1.64e+05 -3.771 0.000 -9.39e+05 -2.97e+05
NEIGHBORHOOD_HARLEM-UPPER -5.423e+05 1.96e+05 -2.762 0.006 -9.27e+05 -1.57e+05
NEIGHBORHOOD_INWOOD 1.101e+04 1.75e+05 0.063 0.950 -3.31e+05 3.53e+05
NEIGHBORHOOD_JAVITS CENTER 2.063e+06 1.88e+05 10.999 0.000 1.7e+06 2.43e+06
NEIGHBORHOOD_KIPS BAY -2.259e+05 1.44e+05 -1.566 0.117 -5.09e+05 5.69e+04
NEIGHBORHOOD_LITTLE ITALY 1.206e+06 2.24e+05 5.384 0.000 7.67e+05 1.64e+06
NEIGHBORHOOD_LOWER EAST SIDE 2.33e+05 1.35e+05 1.725 0.085 -3.18e+04 4.98e+05
NEIGHBORHOOD_MANHATTAN VALLEY -2.94e+05 1.44e+05 -2.043 0.041 -5.76e+05 -1.19e+04
NEIGHBORHOOD_MIDTOWN CBD 3.776e+05 1.55e+05 2.436 0.015 7.38e+04 6.81e+05
NEIGHBORHOOD_MIDTOWN EAST 1.252e+05 1.25e+05 1.003 0.316 -1.19e+05 3.7e+05
NEIGHBORHOOD_MIDTOWN WEST 2.235e+05 1.29e+05 1.735 0.083 -2.9e+04 4.76e+05
NEIGHBORHOOD_MORNINGSIDE HEIGHTS 2.085e+05 1.66e+05 1.252 0.210 -1.18e+05 5.35e+05
NEIGHBORHOOD_MURRAY HILL 1.598e+05 1.29e+05 1.240 0.215 -9.28e+04 4.12e+05
NEIGHBORHOOD_RARE_VALUE -4.698e+04 2.12e+05 -0.221 0.825 -4.63e+05 3.69e+05
NEIGHBORHOOD_SOHO 1.08e+06 1.32e+05 8.183 0.000 8.21e+05 1.34e+06
NEIGHBORHOOD_SOUTHBRIDGE 8.98e+05 1.57e+05 5.704 0.000 5.89e+05 1.21e+06
NEIGHBORHOOD_TRIBECA 1.185e+06 1.33e+05 8.881 0.000 9.23e+05 1.45e+06
NEIGHBORHOOD_UPPER EAST SIDE (59-79) 6.372e+05 1.21e+05 5.247 0.000 3.99e+05 8.75e+05
NEIGHBORHOOD_UPPER EAST SIDE (79-96) 6.227e+05 1.21e+05 5.127 0.000 3.85e+05 8.61e+05
NEIGHBORHOOD_UPPER EAST SIDE (96-110) 1.179e+06 2.22e+05 5.308 0.000 7.43e+05 1.61e+06
NEIGHBORHOOD_UPPER WEST SIDE (59-79) 7.977e+05 1.22e+05 6.558 0.000 5.59e+05 1.04e+06
NEIGHBORHOOD_UPPER WEST SIDE (79-96) 7.59e+05 1.25e+05 6.055 0.000 5.13e+05 1e+06
NEIGHBORHOOD_UPPER WEST SIDE (96-116) 4.52e+05 1.35e+05 3.345 0.001 1.87e+05 7.17e+05
NEIGHBORHOOD_WASHINGTON HEIGHTS LOWER -3.329e+05 1.76e+05 -1.891 0.059 -6.78e+05 1.22e+04
NEIGHBORHOOD_WASHINGTON HEIGHTS UPPER -1.846e+05 1.44e+05 -1.286 0.198 -4.66e+05 9.68e+04
TAX CLASS AT TIME OF SALE_1 4.933e+05 2.74e+05 1.797 0.072 -4.48e+04 1.03e+06
TAX CLASS AT TIME OF SALE_2 -7.766e+05 2.27e+05 -3.414 0.001 -1.22e+06 -3.31e+05
TAX CLASS AT TIME OF SALE_4 1.751e+06 1.98e+05 8.834 0.000 1.36e+06 2.14e+06
BUILDING CLASS CATEGORY_01 ONE FAMILY DWELLINGS 1.768e+06 3.62e+05 4.881 0.000 1.06e+06 2.48e+06
BUILDING CLASS CATEGORY_02 TWO FAMILY DWELLINGS 1.23e+06 3.62e+05 3.395 0.001 5.2e+05 1.94e+06
BUILDING CLASS CATEGORY_03 THREE FAMILY DWELLINGS 1.635e+06 3.96e+05 4.125 0.000 8.58e+05 2.41e+06
BUILDING CLASS CATEGORY_07 RENTALS - WALKUP APARTMENTS 2.437e+06 2.09e+05 11.641 0.000 2.03e+06 2.85e+06
BUILDING CLASS CATEGORY_08 RENTALS - ELEVATOR APARTMENTS 1.339e+06 2.57e+05 5.200 0.000 8.34e+05 1.84e+06
BUILDING CLASS CATEGORY_09 COOPS - WALKUP APARTMENTS -4.109e+05 1.97e+05 -2.091 0.037 -7.96e+05 -2.57e+04
BUILDING CLASS CATEGORY_10 COOPS - ELEVATOR APARTMENTS -1.452e+05 1.92e+05 -0.757 0.449 -5.21e+05 2.31e+05
BUILDING CLASS CATEGORY_11 SPECIAL CONDO BILLING LOTS 6.342e+05 2.29e+05 2.766 0.006 1.85e+05 1.08e+06
BUILDING CLASS CATEGORY_12 CONDOS - WALKUP APARTMENTS 3.76e+05 2.27e+05 1.656 0.098 -6.92e+04 8.21e+05
BUILDING CLASS CATEGORY_13 CONDOS - ELEVATOR APARTMENTS 6.362e+05 1.91e+05 3.325 0.001 2.61e+05 1.01e+06
BUILDING CLASS CATEGORY_14 RENTALS - 4-10 UNIT 2.155e+06 2.71e+05 7.963 0.000 1.62e+06 2.69e+06
BUILDING CLASS CATEGORY_15 CONDOS - 2-10 UNIT RESIDENTIAL 1.236e+06 2.05e+05 6.041 0.000 8.35e+05 1.64e+06
BUILDING CLASS CATEGORY_17 CONDO COOPS -4.588e+05 1.95e+05 -2.351 0.019 -8.41e+05 -7.63e+04
BUILDING CLASS CATEGORY_21 OFFICE BUILDINGS -7.062e+05 3.08e+05 -2.292 0.022 -1.31e+06 -1.02e+05
BUILDING CLASS CATEGORY_22 STORE BUILDINGS 6.208e+05 2.93e+05 2.118 0.034 4.63e+04 1.2e+06
BUILDING CLASS CATEGORY_31 COMMERCIAL VACANT LAND -8.305e+05 4.67e+05 -1.779 0.075 -1.75e+06 8.46e+04
BUILDING CLASS CATEGORY_43 CONDO OFFICE BUILDINGS -1.386e+06 2.43e+05 -5.714 0.000 -1.86e+06 -9.11e+05
BUILDING CLASS CATEGORY_44 CONDO PARKING -9.561e+05 2.86e+05 -3.345 0.001 -1.52e+06 -3.96e+05
BUILDING CLASS CATEGORY_45 CONDO HOTELS -3.723e+06 2.66e+05 -13.991 0.000 -4.24e+06 -3.2e+06
BUILDING CLASS CATEGORY_46 CONDO STORE BUILDINGS -2.065e+06 2.91e+05 -7.108 0.000 -2.63e+06 -1.5e+06
BUILDING CLASS CATEGORY_47 CONDO NON-BUSINESS STORAGE -1.95e+06 2.2e+05 -8.859 0.000 -2.38e+06 -1.52e+06
BUILDING CLASS CATEGORY_RARE_VALUE 3.175e+04 1.65e+05 0.192 0.848 -2.92e+05 3.56e+05
Omnibus: 3393.143 Durbin-Watson: 1.991
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9632.233
Skew: 1.450 Prob(JB): 0.00
Kurtosis: 6.214 Cond. No. 2.95e+16


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.49e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

We see a very large F-test statistic and very small associate p-value in the regression output above (F-stat = 77.85, p-value = 0.00, with rounding).

The p-value is the probability — if the null hypothesis were true — that we would get an F-statistic larger than 77.85.

Therefore, the probability that we would get an F-statistic larger than we observed is very close to 0. Assuming we have a significance level ${\alpha}$ = 0.05, there is sufficient evidence to reject the null hypothese and conclude that at least one of the ${\beta}$s in our full model is not equal to 0.

$\text{Full model T-Test:}$

In the output 2 cells above, we see a very large t-test statistic and very small associate p-value in the regression output above (t-stat = 2.157, p-value = 0.0310, with rounding).

The probability that we would get a t-stat larger than we observed is very close to 0. Assuming we have a significance level ${\alpha}$ = 0.05, there is sufficient evidence to reject the null hypothese and conclude that TAX CLASS 1 has a significant relationship with sales price and the associatiad ${\beta}$s in our full model is not equal to 0.




3. Predictive modeling


In this section we will we take the following steps:

  • establish baseline model
  • comparision of supervised learning methods on small feature set
  • denoising autoencoder to extract denoised features from latent repsentation of our original features
  • comparison of models on denoised features


a. Establish baseline model:


Run unpenalized linear regression on small df to get baseline model loss to try to improve upon

In [ ]:
n_folds = 5

kfold = KFold(n_splits=n_folds, shuffle=False)
cv_results = np.sqrt(-cross_val_score(LinearRegression(), 
                                          X.values, 
                                          y, 
                                          cv=kfold, 
                                          scoring="neg_mean_squared_error"))

print("%s: %f (%f)" % ('BASELINE LR: ', cv_results.mean(), cv_results.std()))
# 1166561.158450 (10486.839846)
BASELINE LR: : 1166561.158450 (10486.839846)

b. Comparision of machine learning models on small feature set:



First, we will run several models and see which perform best with default hyperparameters.

In [ ]:
# models
models = []
models.append(('LR', LinearRegression()))
models.append(('Lasso', Lasso()))
models.append(('AB', AdaBoostRegressor()))
models.append(('XGBR', XGBRegressor()))
models.append(('GBM', GradientBoostingRegressor(random_state = 5)))
models.append(('RF', RandomForestRegressor()))


# build dataframe
results = []
names = []
means = []
stddevs = []
n_folds = 5
for name, model in models:
    kfold = KFold(n_splits=n_folds, shuffle=False)
    cv_results = np.sqrt(-cross_val_score(model, 
                                          X.values, 
                                          y, 
                                          cv=kfold, 
                                          scoring="neg_mean_squared_error"))
    results.append(cv_results)
    means.append(cv_results.mean())
    stddevs.append(cv_results.std())
    names.append(name)
              
df_results = pd.DataFrame(np.array(results).T, columns = names)
[15:37:04] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:37:05] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:37:06] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:37:08] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:37:09] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

Plot MSE of folds for each fitted ML model

In [ ]:
# print mean and stddev root mean squared error from 10 fold 
for col in df_results:
    print("%s: %f (%f)" % (col, df_results[col].mean(), df_results[col].std()))

fig, ax = plt.subplots(figsize=(10,10))
fig = sns.boxplot(data=df_results)
fig = fig.set_xticklabels(labels = df_results.columns,fontdict={'fontsize':15})
ax.set_xlabel('models', size = 20)
ax.set_ylabel('mean squared error', size = 20)
LR: 1166561.158450 (11724.643383)
Lasso: 1166560.268920 (11724.028444)
AB: 1332329.213313 (29032.612953)
XGBR: 1121726.867097 (11113.890558)
GBM: 1121708.064774 (9642.460553)
RF: 1083662.221916 (19112.559204)
Out[ ]:
Text(0, 0.5, 'mean squared error')

Random forest, on average, seems to perform the best out of all of our models before tuning. Several of these models out perform our basline model by a large margin, which supports our hypothesis of model complexity and features improving upon our baseline model performance.


c. Clean features for medium feature set:



First, we will run several models and see which perform best with default hyperparameters.

Create df with previously dropped nominal features (nom_feats_medium). Then we impute and encode new vars

In [ ]:
medium_df = df_cat_trans_small[:].copy()
In [ ]:
nom_feats_medium = ['ZIP CODE', 'TAX CLASS AT PRESENT', 'BUILDING CLASS AT PRESENT', 'BUILDING CLASS AT TIME OF SALE']
for c in nom_feats_medium:
    num = medium_df[c].mode()[0]
    medium_df[c].fillna(num, inplace = True)
    medium_df.loc[medium_df[c].value_counts()[medium_df[c]].values < 50, c] = "RARE_VALUE"

cat_feats_med = CategoricalFeatures(medium_df, 
                                nominal_features = nom_feats_medium,
                                ordinal_features = [], 
                                encoding_type="ohe",
                                handle_na=True)
df_cat_trans_med = cat_feats_med._fit_transform()
In [ ]:
X_medium = df_cat_trans_med.drop('SALE PRICE',axis=1)
y_medium = df_cat_trans_med['SALE PRICE']
In [ ]:
# models
models = []
models.append(('XGBR', XGBRegressor()))
models.append(('RF', RandomForestRegressor()))

# build dataframe
results = []
names = []
means = []
stddevs = []
n_folds = 3
for name, model in models:
    kfold = KFold(n_splits=n_folds, shuffle=False)
    cv_results = np.sqrt(-cross_val_score(model, 
                                          X_medium.values, 
                                          y_medium, 
                                          cv=kfold, 
                                          scoring="neg_mean_squared_error"))
    results.append(cv_results)
    means.append(cv_results.mean())
    stddevs.append(cv_results.std())
    names.append(name)
              
df_results = pd.DataFrame(np.array(results).T, columns = names)
[15:26:51] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:26:53] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[15:26:55] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
In [ ]:
# print mean and stddev root mean squared error from 10 fold 
for col in df_results:
    print("%s: %f (%f)" % (col, df_results[col].mean(), df_results[col].std()))

fig, ax = plt.subplots(figsize=(10,10))
fig = sns.boxplot(data=df_results)
fig = fig.set_xticklabels(labels = df_results.columns,fontdict={'fontsize':15})
ax.set_xlabel('models', size = 20)
ax.set_ylabel('mean squared error', size = 20)
XGBR: 1098027.392121 (20543.130451)
RF: 1072909.568417 (791.444332)
Out[ ]:
Text(0, 0.5, 'mean squared error')

d. Comparision of supervised machine learning models on medium feature set:


For Random Forest, our best performing model, we'll do a gridsearch of a hyperparameter dictionary and find the best performing hyperparameter combination

In [ ]:
from sklearn.model_selection import RandomizedSearchCV
param_grid = {'n_estimators' : [300, 600, 900], 'max_depth': [10, 30, None], 'min_samples_split':[5, 10, 30], 'min_samples_leaf':[2,5,10]}
grid = RandomizedSearchCV(estimator = RandomForestRegressor(random_state = 5),
                    param_distributions = param_grid, 
                    n_iter = 5,
                    scoring = 'neg_mean_squared_error', 
                    cv = 2, 
                    n_jobs = -1)

# %%capture
grid_result = grid.fit(X_medium.values, y_medium)

print("Best: %f using %s" % (np.sqrt(-grid_result.best_score_), grid_result.best_params_))
means = np.sqrt(-grid_result.cv_results_['mean_test_score'])
params = grid_result.cv_results_['params']
for mean, param in zip(means, params):
    print("%f with: %r" % (mean, param))
Best: 1050134.695893 using {'n_estimators': 300, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': None}
1082516.907512 with: {'n_estimators': 900, 'min_samples_split': 5, 'min_samples_leaf': 10, 'max_depth': None}
1050134.695893 with: {'n_estimators': 300, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': None}
1060880.503972 with: {'n_estimators': 900, 'min_samples_split': 10, 'min_samples_leaf': 5, 'max_depth': None}
1088038.820714 with: {'n_estimators': 900, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': 10}
1082963.551891 with: {'n_estimators': 300, 'min_samples_split': 5, 'min_samples_leaf': 10, 'max_depth': None}

e. Supervised neural network prediction:



In this section we optimize our neural network model by training a range od network over the parameter and hyperparameter space.

Below we define classes and helper functions to help train (i.e. minimize RMSE with respect to network weights) a range of networks over the network topology parameters (i.e. layers, number of nodes, regularization, etc.) and hyperparamaters (i.e. learning rate, optimizer, epochs, etc.).

In [ ]:
def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    sampler = optuna.samplers.RandomSampler(seed)
    
seed_everything(seed=42)
In [ ]:
# data set class
class Dataset:
    def __init__(self, features, targets):
        self.features = features
        self.targets = targets
    
    def __len__(self):
        return self.features.shape[0]
    
    def __getitem__(self, item):
        return {
            "x": torch.tensor(self.features[item, :], dtype=torch.float),
            "y": torch.tensor(self.targets[item], dtype=torch.float),
        }
In [ ]:
# function to optimize neural nets
def define_model(trial, param_ranges_dict):
    # We optimize the number of layers, hidden units and dropout ratio in each layer.
    # n_layers = trial.suggest_int("n_layers", 1, 3)
    # batch_norm = trial.suggest_int("batch_norm", 0, 1)  
    batch_norm = 0
    n_layers = sum('n_units' in s for s in list(param_ranges_dict.keys()))
    layers = []
    in_features = len(feature_columns) - 1

 
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i),
                                         param_ranges_dict[f'n_units_l{i}'][0], 
                                         param_ranges_dict[f'n_units_l{i}'][1])

        layers.append(nn.Linear(in_features, out_features))      
        
        if batch_norm:
            layers.append(nn.BatchNorm1d(out_features))
            
        layers.append(nn.ReLU())
        # layers.append(nn.Dropout(p))
        in_features = out_features
    layers.append(nn.Linear(in_features, 1))    

    return nn.Sequential(*layers)
In [ ]:
# class to hold training loops, and loss and noise functions
class Engine:
    def __init__(self, model, optimizer, device):
        self.model = model
        self.optimizer = optimizer
        self.device = device
    
    # statis cause not using anything from init
    @staticmethod
    def loss_fn(outputs, targets):
        targets = targets.unsqueeze(1)
        assert outputs.size() == targets.size()
        assert torch.is_tensor(outputs) == torch.is_tensor(targets)
        
        criterion = nn.MSELoss()              
        return torch.sqrt(torch.mean((outputs - targets)**2))
    
    @staticmethod
    def swap_noise(inputs, swap_noise_p =.15):      
        batch_size, n_features = inputs.size()[0], inputs.size()[1]
        idx = range(batch_size)
        n_rows_swap = int(round(batch_size * swap_noise_p))
        inputs = inputs.numpy()

        for feature_ind in range(n_features):
            
            # shuffle column to sample from
            sampling_feature = np.random.permutation(inputs[:, feature_ind])
            
            # randomly select indices to swap
            swap_idx = np.random.choice(idx, size=n_rows_swap)
            
            # replacing column wise values from sampled column
            inputs[swap_idx, feature_ind] = np.random.choice(sampling_feature, size=n_rows_swap)
            
        return torch.from_numpy(inputs)

    def train(self, data_loader):
        self.model.train()
        final_loss = 0
        for data in data_loader:    
            self.optimizer.zero_grad()   
            inputs = data["x"].to(self.device)
            targets = data["y"].to(self.device)
            outputs = self.model(inputs)
            loss = self.loss_fn(outputs, targets)                       
            loss.backward()
            self.optimizer.step()
            final_loss += loss.item()
            
        return final_loss / len(data_loader)

    def dae_train(self, data_loader):
        self.model.train()
        final_loss = 0
        for data in data_loader:    
            self.optimizer.zero_grad()   
            targets = data["x"].to(self.device) 
            aug_inputs = self.swap_noise(data["x"], .15).to(self.device) 
            outputs = self.model(aug_inputs)
            loss = nn.MSELoss()(outputs, targets)                  
            loss.backward()
            self.optimizer.step()
            final_loss += loss.item()
        return final_loss / len(data_loader)

    def evaluate(self, data_loader):
        self.model.eval()
        final_loss = 0
        for data in data_loader:
            inputs = data["x"].to(self.device)
            targets = data["y"].to(self.device)
            outputs = self.model(inputs)
            loss = self.loss_fn(outputs, targets)
            final_loss += loss.item()
        return final_loss / len(data_loader)

    @staticmethod
    def plot_learn_curves(epoch_losses, time_stamp):
        plt.ioff()
        fig = plt.figure()
        plt.style.use('seaborn')
        plt.xlabel('Epochs')
        plt.ylabel('MSE')
        plt.title(f'Learning curves - {time_stamp}')
        plt.plot(epoch_losses)
        plt.savefig(f'{config_dae.PLOT_PATH}{script_name}_LC_{time_stamp}.png')
        plt.close(fig)




Define objective function:

Our objective function below is a training loop that fits the entire dataset to our model over 1000 epochs or until we overfit invoke early stopping.

In addition, this objective function can be repeatedly ran with different network topology parameters and hyperparameters to optimize our network topology. We use the optimization package 'optuna' to optimize this objective function over the parameter and hyperparameter space containing the following dimension: number of network layers, number of layer nodes, dropout, batch normalization, optimizer, and learning rate.


In [ ]:
def objective(trial, 
              param_ranges_dict,
              save_model = False, 
              plot_learning_curves = False
             ):  
    
    objective_start_time = f'{datetime.datetime.now().strftime("%Y%m%d %H:%M:%S")}'
    
    print(f"TRIAL {len(study.trials)} START TIME: {objective_start_time}\n")
    
    #------- RUN TRAINING LOOP ----------
    total_loss = 0
    fold_losses = []
    epoch_losses_by_fold = []

    for fold in range(5):
        train_df = train[train.kfold != fold].reset_index(drop=True)
        valid_df = train[train.kfold == fold].reset_index(drop=True)

        x_train = train_df[feature_columns].drop('kfold', axis=1).to_numpy()
        y_train = train_df[target_columns].to_numpy()

        x_valid = valid_df[feature_columns].drop('kfold', axis=1).to_numpy()
        y_valid = valid_df[target_columns].to_numpy()

        train_dataset = Dataset(features=x_train, targets=y_train)
        valid_dataset = Dataset(features=x_valid, targets=y_valid)

        train_loader = torch.utils.data.DataLoader(
            train_dataset, batch_size=64, num_workers=4, shuffle=True
        )

        valid_loader = torch.utils.data.DataLoader(
            valid_dataset, batch_size=64, num_workers=4
        )
        
        # instantiate the model.
        model = define_model(trial, param_ranges_dict).to(DEVICE)

        # Generate the learning rate and optimizer
        lr = trial.suggest_float("lr", param_ranges_dict['lr'][0], param_ranges_dict['lr'][1], log=True)        
        optimizer_name = 'Adam'
        optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)

        # Print trial parameters        
        if fold == 0:
            print(f"OPTIMIZER:\n{optimizer}\n")
            
            current_trial = study.trials[len(study.trials) - 1]
            print(f"CURRENT TRIAL PARAMETERS: ")    
            for key, value in current_trial.params.items():
                print("    {}: {}".format(key, value))
            print("\n")
        
        # Training of the model.
        eng = Engine(model, optimizer, device=DEVICE)

        best_loss = np.inf
        early_stopping_iter = 10
        early_stopping_counter = 0 
        
        epoch_losses = []
        for epoch in range(EPOCHS):
            
            train_loss = eng.train(train_loader)
            valid_loss = eng.evaluate(valid_loader)         
  
            if (epoch % 10 == 0) or (early_stopping_counter >= 25):
                print(
                    f"FOLD: {fold} -- EPOCH: {epoch:.0f} -- TRAIN RMSE LOSS: {train_loss:.4f} -- "
                    f"VALID RMSE LOSS: {valid_loss:.4f} -- {datetime.datetime.now().strftime('%Y%m%d%H:%M:%S')}" 
                )

            if valid_loss < best_loss:
                best_loss = valid_loss
                early_stopping_counter = 0
                if save_model:
                    torch.save(model.state_dict(), f"models/nn_{trial.number}_{objective_start_time}.bin")
            else:
                early_stopping_counter += 1

            if early_stopping_counter > early_stopping_iter:
                break
        
        print(
            f"FOLD: {fold} ----- BEST VALID RMSE LOSS: {best_loss} ----- "
            f"{datetime.datetime.now().strftime('%Y%m%d %H:%M:%S')}\n"
        )

        fold_losses.append(best_loss)
        total_loss += best_loss
    

    # total cv loss
    CV_loss = total_loss/5
    print('FOLDS:')
    print(f" FOLD 0: {fold_losses[0]:.5f}")
    print(f" FOLD 1: {fold_losses[1]:.5f}")
    print(f" FOLD 2: {fold_losses[2]:.5f}")
    print(f" FOLD 3: {fold_losses[3]:.5f}")
    print(f" FOLD 4: {fold_losses[4]:.5f}\n")
    
    print(f"CROSS VALIDATION SCORE: ")
    print(f" {CV_loss:.10f} ----- {datetime.datetime.now().strftime('%Y%m%d %H:%M')}\n")  
    
    if len(study.trials) > 1:
        print("**************")
        print("Best CV trial:")
        best_trial_temp = study.best_trial

        print(f"  Value:  {best_trial_temp.value}\n")
        print("  Params: ")
        for key, value in trial.params.items():
            print("    {}: {}".format(key, value))
        print("**************\n")
    print(f"{100*'#'}\n")
        
    trials_df = study.trials_dataframe()
    trials_df.to_csv(f'logs/nn_trials_{study_start_time}.csv')

    return CV_loss

Assign folds for neural network training:


In [ ]:
df_cat_trans_med["kfold"] = -1
df_cat_trans_med = df_cat_trans_med.reset_index(drop=True)
kf = KFold(n_splits=5, shuffle=False)

for fold, (train_idx, val_idx) in enumerate(kf.split(X=df_cat_trans_med, y=df_cat_trans_med['SALE PRICE'].values)):
    df_cat_trans_med.loc[val_idx, 'kfold'] = fold

feature_columns =  list(X_medium.columns) + ['kfold']
target_columns = y_medium.name

train = df_cat_trans_med[:].copy()
train.kfold.value_counts()
Out[ ]:
1    2468
0    2468
4    2467
3    2467
2    2467
Name: kfold, dtype: int64

Start neural network architecture search (NAS) with respect to RMSE:


In [ ]:
# dictionary for network parameters
optuna_params_dict = {'lr': [.0005, .005],
                      
                      'n_units_l0': [260, 300],
                      'n_units_l1': [170, 240],
                      'n_units_l2': [120, 180],
                      'n_units_l3': [100, 130],
                      'n_units_l4': [70, 100],
                      'n_units_l5': [50, 80],
                      'n_units_l6': [20, 40]}

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
EPOCHS = 1000
number_of_trials = 10

if __name__ == "__main__":  
    study_start_time = f'{datetime.datetime.now().strftime("%Y%m%d %H:%M")}'
    study = optuna.create_study(direction="minimize")
    study.optimize(lambda trial: objective(trial, optuna_params_dict, save_model=False), n_trials=number_of_trials)

    print("Study statistics: ")
    print("  Number of finished trials: ", len(study.trials), "\n")

    print("Best trial:")
    trial = study.best_trial

    print("  Value: ", trial.value)
    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
        
    trials_df = study.trials_dataframe()
    print(f"\n\n\nTRIALS DATAFRAME: ")
    print(trials_df)

    joblib.dump(study,f'logs/nn_study_{study_start_time}.pkl')
TRIAL 1 START TIME: 20210917 13:20:03

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0022993978383444307
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 270
    n_units_l1: 208
    n_units_l2: 149
    n_units_l3: 128
    n_units_l4: 90
    n_units_l5: 62
    n_units_l6: 29
    lr: 0.0022993978383444307


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1610087.3343 -- VALID RMSE LOSS: 1386993.2179 -- 2021091713:20:10
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074660.3302 -- VALID RMSE LOSS: 1092375.8830 -- 2021091713:20:21
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1049722.9766 -- VALID RMSE LOSS: 1075963.8349 -- 2021091713:20:31
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1007276.6661 -- VALID RMSE LOSS: 1083424.6426 -- 2021091713:20:41
FOLD: 0 ----- BEST VALID RMSE LOSS: 1075963.8349358975 ----- 20210917 13:20:42

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1601037.8226 -- VALID RMSE LOSS: 1306210.5304 -- 2021091713:20:43
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1083486.6782 -- VALID RMSE LOSS: 1077917.8718 -- 2021091713:20:54
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1040469.5210 -- VALID RMSE LOSS: 1062094.1026 -- 2021091713:21:04
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1012770.2770 -- VALID RMSE LOSS: 1055325.8205 -- 2021091713:21:14
FOLD: 1 -- EPOCH: 40 -- TRAIN RMSE LOSS: 980900.4173 -- VALID RMSE LOSS: 1102626.3942 -- 2021091713:21:25
FOLD: 1 ----- BEST VALID RMSE LOSS: 1049507.7083333333 ----- 20210917 13:21:29

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1614085.0214 -- VALID RMSE LOSS: 1394115.4712 -- 2021091713:21:30
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1084639.6133 -- VALID RMSE LOSS: 1141316.8974 -- 2021091713:21:41
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1049314.1710 -- VALID RMSE LOSS: 1110229.3814 -- 2021091713:21:51
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1019108.9524 -- VALID RMSE LOSS: 1060019.7532 -- 2021091713:22:01
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 991295.1319 -- VALID RMSE LOSS: 1074642.6891 -- 2021091713:22:12
FOLD: 2 -- EPOCH: 50 -- TRAIN RMSE LOSS: 967218.7774 -- VALID RMSE LOSS: 1048883.5417 -- 2021091713:22:22
FOLD: 2 ----- BEST VALID RMSE LOSS: 1047192.6907051282 ----- 20210917 13:22:22

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1614474.3750 -- VALID RMSE LOSS: 1392722.6058 -- 2021091713:22:23
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1084372.5512 -- VALID RMSE LOSS: 1094093.1362 -- 2021091713:22:34
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1045110.6653 -- VALID RMSE LOSS: 1083344.1747 -- 2021091713:22:44
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1011183.9694 -- VALID RMSE LOSS: 1083590.1266 -- 2021091713:22:54
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 988768.8185 -- VALID RMSE LOSS: 1075928.6410 -- 2021091713:23:05
FOLD: 3 ----- BEST VALID RMSE LOSS: 1066381.3525641025 ----- 20210917 13:23:13

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1597391.4577 -- VALID RMSE LOSS: 1430099.1891 -- 2021091713:23:14
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1079038.9887 -- VALID RMSE LOSS: 1101037.6747 -- 2021091713:23:25
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1049692.4085 -- VALID RMSE LOSS: 1090905.9647 -- 2021091713:23:35
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1019414.2369 -- VALID RMSE LOSS: 1069071.6218 -- 2021091713:23:45
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 996741.4448 -- VALID RMSE LOSS: 1075558.6458 -- 2021091713:23:56
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 972678.2169 -- VALID RMSE LOSS: 1079217.4503 -- 2021091713:24:06
FOLD: 4 ----- BEST VALID RMSE LOSS: 1065691.4647435897 ----- 20210917 13:24:09

FOLDS:
 FOLD 0: 1075963.83494
 FOLD 1: 1049507.70833
 FOLD 2: 1047192.69071
 FOLD 3: 1066381.35256
 FOLD 4: 1065691.46474

CROSS VALIDATION SCORE: 
 1060947.4102564105 ----- 20210917 13:24

####################################################################################################

TRIAL 2 START TIME: 20210917 13:24:10

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0007783277942771386
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 294
    n_units_l1: 240
    n_units_l2: 167
    n_units_l3: 115
    n_units_l4: 95
    n_units_l5: 59
    n_units_l6: 39
    lr: 0.0007783277942771386


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1800952.3690 -- VALID RMSE LOSS: 1452515.9503 -- 2021091713:24:11
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1107001.3460 -- VALID RMSE LOSS: 1124011.4439 -- 2021091713:24:21
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1078475.5718 -- VALID RMSE LOSS: 1111422.6362 -- 2021091713:24:32
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1059634.9367 -- VALID RMSE LOSS: 1091970.5689 -- 2021091713:24:42
FOLD: 0 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1049221.1569 -- VALID RMSE LOSS: 1080889.0032 -- 2021091713:24:53
FOLD: 0 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1025691.0411 -- VALID RMSE LOSS: 1084322.1763 -- 2021091713:25:03
FOLD: 0 ----- BEST VALID RMSE LOSS: 1077168.123397436 ----- 20210917 13:25:11

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1775833.7056 -- VALID RMSE LOSS: 1416595.0369 -- 2021091713:25:12
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1104827.2226 -- VALID RMSE LOSS: 1092365.4022 -- 2021091713:25:22
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1076227.1129 -- VALID RMSE LOSS: 1072408.1490 -- 2021091713:25:33
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1060449.9754 -- VALID RMSE LOSS: 1056441.3269 -- 2021091713:25:44
FOLD: 1 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1038403.2226 -- VALID RMSE LOSS: 1051211.8542 -- 2021091713:25:54
FOLD: 1 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1014789.7077 -- VALID RMSE LOSS: 1050241.6891 -- 2021091713:26:05
FOLD: 1 -- EPOCH: 60 -- TRAIN RMSE LOSS: 993338.8294 -- VALID RMSE LOSS: 1054532.3349 -- 2021091713:26:15
FOLD: 1 ----- BEST VALID RMSE LOSS: 1048763.8365384615 ----- 20210917 13:26:17

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1770335.3020 -- VALID RMSE LOSS: 1475622.9872 -- 2021091713:26:19
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1102723.5935 -- VALID RMSE LOSS: 1096036.6218 -- 2021091713:26:29
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1069721.7444 -- VALID RMSE LOSS: 1076244.1298 -- 2021091713:26:40
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1059411.2980 -- VALID RMSE LOSS: 1072713.3462 -- 2021091713:26:51
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1034605.1766 -- VALID RMSE LOSS: 1059226.0946 -- 2021091713:27:01
FOLD: 2 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1013586.6254 -- VALID RMSE LOSS: 1061355.7035 -- 2021091713:27:12
FOLD: 2 -- EPOCH: 60 -- TRAIN RMSE LOSS: 990440.7843 -- VALID RMSE LOSS: 1054618.2628 -- 2021091713:27:23
FOLD: 2 -- EPOCH: 70 -- TRAIN RMSE LOSS: 979007.1411 -- VALID RMSE LOSS: 1047136.3814 -- 2021091713:27:33
FOLD: 2 -- EPOCH: 80 -- TRAIN RMSE LOSS: 963414.3407 -- VALID RMSE LOSS: 1048765.3750 -- 2021091713:27:44
FOLD: 2 -- EPOCH: 90 -- TRAIN RMSE LOSS: 939883.1790 -- VALID RMSE LOSS: 1045732.1891 -- 2021091713:27:55
FOLD: 2 ----- BEST VALID RMSE LOSS: 1043997.8060897436 ----- 20210917 13:27:55

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1758828.0266 -- VALID RMSE LOSS: 1460800.4103 -- 2021091713:27:56
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1108752.4065 -- VALID RMSE LOSS: 1134229.0513 -- 2021091713:28:07
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1078329.1468 -- VALID RMSE LOSS: 1093423.3782 -- 2021091713:28:17
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1055455.6448 -- VALID RMSE LOSS: 1087555.3349 -- 2021091713:28:28
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1044466.1298 -- VALID RMSE LOSS: 1070330.4215 -- 2021091713:28:39
FOLD: 3 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1024706.7460 -- VALID RMSE LOSS: 1067347.6506 -- 2021091713:28:50
FOLD: 3 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1013803.2532 -- VALID RMSE LOSS: 1057678.3846 -- 2021091713:29:01
FOLD: 3 ----- BEST VALID RMSE LOSS: 1056725.3397435897 ----- 20210917 13:29:04

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1756955.8794 -- VALID RMSE LOSS: 1499160.7628 -- 2021091713:29:05
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1101077.9008 -- VALID RMSE LOSS: 1116799.9054 -- 2021091713:29:16
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1068731.3198 -- VALID RMSE LOSS: 1099778.5913 -- 2021091713:29:27
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1046927.2121 -- VALID RMSE LOSS: 1085726.1282 -- 2021091713:29:38
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1031136.6540 -- VALID RMSE LOSS: 1084951.4888 -- 2021091713:29:49
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1003798.3496 -- VALID RMSE LOSS: 1074119.0641 -- 2021091713:30:00
FOLD: 4 -- EPOCH: 60 -- TRAIN RMSE LOSS: 980867.7661 -- VALID RMSE LOSS: 1070603.8862 -- 2021091713:30:11
FOLD: 4 -- EPOCH: 70 -- TRAIN RMSE LOSS: 959721.7335 -- VALID RMSE LOSS: 1076894.7083 -- 2021091713:30:22
FOLD: 4 ----- BEST VALID RMSE LOSS: 1060396.3157051282 ----- 20210917 13:30:31

FOLDS:
 FOLD 0: 1077168.12340
 FOLD 1: 1048763.83654
 FOLD 2: 1043997.80609
 FOLD 3: 1056725.33974
 FOLD 4: 1060396.31571

CROSS VALIDATION SCORE: 
 1057410.2842948718 ----- 20210917 13:30

**************
Best CV trial:
  Value:  1060947.4102564105

  Params: 
    n_units_l0: 294
    n_units_l1: 240
    n_units_l2: 167
    n_units_l3: 115
    n_units_l4: 95
    n_units_l5: 59
    n_units_l6: 39
    lr: 0.0007783277942771386
**************

####################################################################################################

TRIAL 3 START TIME: 20210917 13:30:31

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.000600384947312816
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 291
    n_units_l1: 174
    n_units_l2: 143
    n_units_l3: 115
    n_units_l4: 85
    n_units_l5: 71
    n_units_l6: 30
    lr: 0.000600384947312816


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1850569.3669 -- VALID RMSE LOSS: 1458783.1106 -- 2021091713:30:32
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1116949.6137 -- VALID RMSE LOSS: 1129008.7692 -- 2021091713:30:43
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1082003.3190 -- VALID RMSE LOSS: 1108300.7997 -- 2021091713:30:53
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1065302.2948 -- VALID RMSE LOSS: 1092566.8365 -- 2021091713:31:04
FOLD: 0 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1051351.6165 -- VALID RMSE LOSS: 1086877.5593 -- 2021091713:31:15
FOLD: 0 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1040318.3026 -- VALID RMSE LOSS: 1087526.3237 -- 2021091713:31:26
FOLD: 0 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1021846.0226 -- VALID RMSE LOSS: 1078413.1170 -- 2021091713:31:37
FOLD: 0 -- EPOCH: 70 -- TRAIN RMSE LOSS: 1011725.7617 -- VALID RMSE LOSS: 1083162.7756 -- 2021091713:31:48
FOLD: 0 ----- BEST VALID RMSE LOSS: 1078413.1169871795 ----- 20210917 13:31:49

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1870199.4452 -- VALID RMSE LOSS: 1426177.3590 -- 2021091713:31:50
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1120134.9419 -- VALID RMSE LOSS: 1102686.8173 -- 2021091713:32:01
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1086026.0145 -- VALID RMSE LOSS: 1078575.3253 -- 2021091713:32:13
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1072651.9476 -- VALID RMSE LOSS: 1064661.1298 -- 2021091713:32:24
FOLD: 1 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1055924.6431 -- VALID RMSE LOSS: 1058886.5321 -- 2021091713:32:35
FOLD: 1 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1041489.0645 -- VALID RMSE LOSS: 1058105.5369 -- 2021091713:32:48
FOLD: 1 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1026636.3565 -- VALID RMSE LOSS: 1049581.4952 -- 2021091713:32:59
FOLD: 1 -- EPOCH: 70 -- TRAIN RMSE LOSS: 1014218.1069 -- VALID RMSE LOSS: 1044247.7196 -- 2021091713:33:10
FOLD: 1 -- EPOCH: 80 -- TRAIN RMSE LOSS: 992858.6859 -- VALID RMSE LOSS: 1034876.4824 -- 2021091713:33:21
FOLD: 1 -- EPOCH: 90 -- TRAIN RMSE LOSS: 979118.0246 -- VALID RMSE LOSS: 1061006.8173 -- 2021091713:33:32
FOLD: 1 ----- BEST VALID RMSE LOSS: 1034876.4823717949 ----- 20210917 13:33:33

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1857707.7964 -- VALID RMSE LOSS: 1487743.3462 -- 2021091713:33:35
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1115023.3766 -- VALID RMSE LOSS: 1112485.5160 -- 2021091713:33:46
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1086052.5778 -- VALID RMSE LOSS: 1086361.6314 -- 2021091713:33:57
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1062106.6641 -- VALID RMSE LOSS: 1080290.9647 -- 2021091713:34:08
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1052589.2173 -- VALID RMSE LOSS: 1073202.4503 -- 2021091713:34:20
FOLD: 2 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1037290.4976 -- VALID RMSE LOSS: 1064258.5593 -- 2021091713:34:31
FOLD: 2 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1023330.8290 -- VALID RMSE LOSS: 1051244.5849 -- 2021091713:34:42
FOLD: 2 -- EPOCH: 70 -- TRAIN RMSE LOSS: 1007697.0883 -- VALID RMSE LOSS: 1070111.8109 -- 2021091713:34:53
FOLD: 2 -- EPOCH: 80 -- TRAIN RMSE LOSS: 986198.6141 -- VALID RMSE LOSS: 1048463.2340 -- 2021091713:35:05
FOLD: 2 -- EPOCH: 90 -- TRAIN RMSE LOSS: 968605.8153 -- VALID RMSE LOSS: 1048297.5946 -- 2021091713:35:16
FOLD: 2 -- EPOCH: 100 -- TRAIN RMSE LOSS: 956286.5565 -- VALID RMSE LOSS: 1045962.9359 -- 2021091713:35:27
FOLD: 2 ----- BEST VALID RMSE LOSS: 1041788.3493589744 ----- 20210917 13:35:35

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1859140.5702 -- VALID RMSE LOSS: 1455453.8558 -- 2021091713:35:37
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1115474.1573 -- VALID RMSE LOSS: 1152745.8750 -- 2021091713:35:48
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1086234.0472 -- VALID RMSE LOSS: 1113394.0224 -- 2021091713:35:59
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1073078.3923 -- VALID RMSE LOSS: 1093643.0048 -- 2021091713:36:11
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1058839.0629 -- VALID RMSE LOSS: 1083837.8686 -- 2021091713:36:22
FOLD: 3 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1048890.1133 -- VALID RMSE LOSS: 1075805.1314 -- 2021091713:36:34
FOLD: 3 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1036765.2371 -- VALID RMSE LOSS: 1071086.7580 -- 2021091713:36:45
FOLD: 3 -- EPOCH: 70 -- TRAIN RMSE LOSS: 1022476.1714 -- VALID RMSE LOSS: 1074205.1875 -- 2021091713:36:57
FOLD: 3 -- EPOCH: 80 -- TRAIN RMSE LOSS: 1011382.6871 -- VALID RMSE LOSS: 1071597.0769 -- 2021091713:37:09
FOLD: 3 -- EPOCH: 90 -- TRAIN RMSE LOSS: 993230.6806 -- VALID RMSE LOSS: 1054520.4391 -- 2021091713:37:20
FOLD: 3 -- EPOCH: 100 -- TRAIN RMSE LOSS: 985486.9976 -- VALID RMSE LOSS: 1051065.2532 -- 2021091713:37:32
FOLD: 3 -- EPOCH: 110 -- TRAIN RMSE LOSS: 969567.5089 -- VALID RMSE LOSS: 1067202.0561 -- 2021091713:37:43
FOLD: 3 ----- BEST VALID RMSE LOSS: 1049113.7996794872 ----- 20210917 13:37:50

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1826900.8016 -- VALID RMSE LOSS: 1506120.4487 -- 2021091713:37:51
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1107850.6367 -- VALID RMSE LOSS: 1136167.2484 -- 2021091713:38:03
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1073081.3141 -- VALID RMSE LOSS: 1107975.6538 -- 2021091713:38:14
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1057490.8665 -- VALID RMSE LOSS: 1096984.4904 -- 2021091713:38:26
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1046822.9996 -- VALID RMSE LOSS: 1086088.8862 -- 2021091713:38:38
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1031778.9766 -- VALID RMSE LOSS: 1108808.6362 -- 2021091713:38:50
FOLD: 4 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1015647.0810 -- VALID RMSE LOSS: 1111929.8125 -- 2021091713:39:01
FOLD: 4 -- EPOCH: 70 -- TRAIN RMSE LOSS: 997920.8613 -- VALID RMSE LOSS: 1082553.8494 -- 2021091713:39:13
FOLD: 4 -- EPOCH: 80 -- TRAIN RMSE LOSS: 982607.4964 -- VALID RMSE LOSS: 1068573.4391 -- 2021091713:39:25
FOLD: 4 -- EPOCH: 90 -- TRAIN RMSE LOSS: 963737.3133 -- VALID RMSE LOSS: 1072917.8558 -- 2021091713:39:37
FOLD: 4 ----- BEST VALID RMSE LOSS: 1059432.0080128205 ----- 20210917 13:39:42

FOLDS:
 FOLD 0: 1078413.11699
 FOLD 1: 1034876.48237
 FOLD 2: 1041788.34936
 FOLD 3: 1049113.79968
 FOLD 4: 1059432.00801

CROSS VALIDATION SCORE: 
 1052724.7512820512 ----- 20210917 13:39

**************
Best CV trial:
  Value:  1057410.2842948718

  Params: 
    n_units_l0: 291
    n_units_l1: 174
    n_units_l2: 143
    n_units_l3: 115
    n_units_l4: 85
    n_units_l5: 71
    n_units_l6: 30
    lr: 0.000600384947312816
**************

####################################################################################################

TRIAL 4 START TIME: 20210917 13:39:42

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.004810808447865234
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 286
    n_units_l1: 180
    n_units_l2: 163
    n_units_l3: 118
    n_units_l4: 87
    n_units_l5: 76
    n_units_l6: 30
    lr: 0.004810808447865234


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1521132.2133 -- VALID RMSE LOSS: 1340206.4487 -- 2021091713:39:44
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1068312.2133 -- VALID RMSE LOSS: 1097274.8638 -- 2021091713:39:55
FOLD: 0 ----- BEST VALID RMSE LOSS: 1092060.5080128205 ----- 20210917 13:40:05

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1504366.1077 -- VALID RMSE LOSS: 1180387.0128 -- 2021091713:40:06
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1078310.9456 -- VALID RMSE LOSS: 1115595.1314 -- 2021091713:40:18
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1034294.2528 -- VALID RMSE LOSS: 1061075.3846 -- 2021091713:40:30
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 997087.9819 -- VALID RMSE LOSS: 1128334.9487 -- 2021091713:40:41
FOLD: 1 ----- BEST VALID RMSE LOSS: 1048389.8653846154 ----- 20210917 13:40:49

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1524063.2980 -- VALID RMSE LOSS: 1238630.6827 -- 2021091713:40:50
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1065299.6016 -- VALID RMSE LOSS: 1149129.5064 -- 2021091713:41:02
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1034288.7339 -- VALID RMSE LOSS: 1176416.1875 -- 2021091713:41:14
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 994102.6681 -- VALID RMSE LOSS: 1081773.0304 -- 2021091713:41:25
FOLD: 2 ----- BEST VALID RMSE LOSS: 1062271.282051282 ----- 20210917 13:41:29

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1453072.3508 -- VALID RMSE LOSS: 1218233.5000 -- 2021091713:41:30
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1071298.9895 -- VALID RMSE LOSS: 1080394.9856 -- 2021091713:41:42
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1031007.4980 -- VALID RMSE LOSS: 1124930.8494 -- 2021091713:41:54
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1007824.6343 -- VALID RMSE LOSS: 1064736.4631 -- 2021091713:42:06
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 979803.6036 -- VALID RMSE LOSS: 1083852.2917 -- 2021091713:42:18
FOLD: 3 ----- BEST VALID RMSE LOSS: 1057304.876602564 ----- 20210917 13:42:18

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1474421.8710 -- VALID RMSE LOSS: 1214034.6939 -- 2021091713:42:20
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1070650.2077 -- VALID RMSE LOSS: 1098797.2596 -- 2021091713:42:32
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1024588.0944 -- VALID RMSE LOSS: 1083785.3237 -- 2021091713:42:44
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1010425.1379 -- VALID RMSE LOSS: 1112091.0978 -- 2021091713:42:56
FOLD: 4 ----- BEST VALID RMSE LOSS: 1067055.233974359 ----- 20210917 13:43:05

FOLDS:
 FOLD 0: 1092060.50801
 FOLD 1: 1048389.86538
 FOLD 2: 1062271.28205
 FOLD 3: 1057304.87660
 FOLD 4: 1067055.23397

CROSS VALIDATION SCORE: 
 1065416.3532051281 ----- 20210917 13:43

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 286
    n_units_l1: 180
    n_units_l2: 163
    n_units_l3: 118
    n_units_l4: 87
    n_units_l5: 76
    n_units_l6: 30
    lr: 0.004810808447865234
**************

####################################################################################################

TRIAL 5 START TIME: 20210917 13:43:05

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.00092241576721509
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 275
    n_units_l1: 170
    n_units_l2: 131
    n_units_l3: 114
    n_units_l4: 91
    n_units_l5: 55
    n_units_l6: 32
    lr: 0.00092241576721509


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1756791.2153 -- VALID RMSE LOSS: 1448581.7436 -- 2021091713:43:07
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1099294.2093 -- VALID RMSE LOSS: 1118604.7917 -- 2021091713:43:19
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1064743.1706 -- VALID RMSE LOSS: 1090197.9215 -- 2021091713:43:31
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1043252.9851 -- VALID RMSE LOSS: 1080221.8061 -- 2021091713:43:43
FOLD: 0 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1019043.2935 -- VALID RMSE LOSS: 1080746.1170 -- 2021091713:43:55
FOLD: 0 ----- BEST VALID RMSE LOSS: 1075450.5576923077 ----- 20210917 13:44:03

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1800430.4363 -- VALID RMSE LOSS: 1416929.2901 -- 2021091713:44:04
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1104717.0282 -- VALID RMSE LOSS: 1089967.5673 -- 2021091713:44:16
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1073273.9956 -- VALID RMSE LOSS: 1069508.5000 -- 2021091713:44:29
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1056714.3960 -- VALID RMSE LOSS: 1087296.7997 -- 2021091713:44:41
FOLD: 1 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1045104.1081 -- VALID RMSE LOSS: 1048982.8237 -- 2021091713:44:53
FOLD: 1 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1017998.5510 -- VALID RMSE LOSS: 1048410.6843 -- 2021091713:45:05
FOLD: 1 ----- BEST VALID RMSE LOSS: 1047871.5496794871 ----- 20210917 13:45:13

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1767746.4565 -- VALID RMSE LOSS: 1474869.8974 -- 2021091713:45:14
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1096421.4673 -- VALID RMSE LOSS: 1096180.1458 -- 2021091713:45:26
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1071439.4218 -- VALID RMSE LOSS: 1091065.1538 -- 2021091713:45:38
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1053905.7004 -- VALID RMSE LOSS: 1066189.5256 -- 2021091713:45:51
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1030539.9629 -- VALID RMSE LOSS: 1059500.4872 -- 2021091713:46:03
FOLD: 2 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1009156.6899 -- VALID RMSE LOSS: 1050938.5913 -- 2021091713:46:15
FOLD: 2 -- EPOCH: 60 -- TRAIN RMSE LOSS: 983662.2681 -- VALID RMSE LOSS: 1078549.2853 -- 2021091713:46:28
FOLD: 2 ----- BEST VALID RMSE LOSS: 1045342.9166666666 ----- 20210917 13:46:30

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1771959.4524 -- VALID RMSE LOSS: 1448815.5321 -- 2021091713:46:31
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1102556.0988 -- VALID RMSE LOSS: 1124491.7115 -- 2021091713:46:44
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1075368.0786 -- VALID RMSE LOSS: 1095722.9968 -- 2021091713:46:56
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1059269.5419 -- VALID RMSE LOSS: 1082459.4022 -- 2021091713:47:08
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1039188.3810 -- VALID RMSE LOSS: 1074805.9551 -- 2021091713:47:21
FOLD: 3 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1029929.0613 -- VALID RMSE LOSS: 1071825.1731 -- 2021091713:47:33
FOLD: 3 -- EPOCH: 60 -- TRAIN RMSE LOSS: 1004110.1089 -- VALID RMSE LOSS: 1068158.6827 -- 2021091713:47:46
FOLD: 3 -- EPOCH: 70 -- TRAIN RMSE LOSS: 997354.4214 -- VALID RMSE LOSS: 1060226.7420 -- 2021091713:47:58
FOLD: 3 ----- BEST VALID RMSE LOSS: 1050498.998397436 ----- 20210917 13:48:06

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1770766.4810 -- VALID RMSE LOSS: 1496894.2212 -- 2021091713:48:07
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1096868.2375 -- VALID RMSE LOSS: 1116529.2756 -- 2021091713:48:19
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1064005.6407 -- VALID RMSE LOSS: 1097360.3301 -- 2021091713:48:32
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1050220.0798 -- VALID RMSE LOSS: 1088061.4696 -- 2021091713:48:44
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 1026130.8883 -- VALID RMSE LOSS: 1082366.6026 -- 2021091713:48:57
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 1006181.5794 -- VALID RMSE LOSS: 1073716.6442 -- 2021091713:49:09
FOLD: 4 -- EPOCH: 60 -- TRAIN RMSE LOSS: 985611.3742 -- VALID RMSE LOSS: 1082399.5529 -- 2021091713:49:21
FOLD: 4 -- EPOCH: 70 -- TRAIN RMSE LOSS: 970239.5536 -- VALID RMSE LOSS: 1070201.4696 -- 2021091713:49:34
FOLD: 4 ----- BEST VALID RMSE LOSS: 1062155.3958333333 ----- 20210917 13:49:45

FOLDS:
 FOLD 0: 1075450.55769
 FOLD 1: 1047871.54968
 FOLD 2: 1045342.91667
 FOLD 3: 1050498.99840
 FOLD 4: 1062155.39583

CROSS VALIDATION SCORE: 
 1056263.8836538461 ----- 20210917 13:49

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 275
    n_units_l1: 170
    n_units_l2: 131
    n_units_l3: 114
    n_units_l4: 91
    n_units_l5: 55
    n_units_l6: 32
    lr: 0.00092241576721509
**************

####################################################################################################

TRIAL 6 START TIME: 20210917 13:49:45

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.002525392383415848
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 275
    n_units_l1: 239
    n_units_l2: 128
    n_units_l3: 109
    n_units_l4: 100
    n_units_l5: 71
    n_units_l6: 21
    lr: 0.002525392383415848


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1635285.9331 -- VALID RMSE LOSS: 1410098.5288 -- 2021091713:49:47
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1083404.9302 -- VALID RMSE LOSS: 1113111.1282 -- 2021091713:49:59
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1051886.5839 -- VALID RMSE LOSS: 1092441.3478 -- 2021091713:50:12
FOLD: 0 ----- BEST VALID RMSE LOSS: 1083924.0240384615 ----- 20210917 13:50:20

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1636576.4234 -- VALID RMSE LOSS: 1361231.5272 -- 2021091713:50:22
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1080849.5996 -- VALID RMSE LOSS: 1085230.8926 -- 2021091713:50:34
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1052881.5165 -- VALID RMSE LOSS: 1056233.3045 -- 2021091713:50:47
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1017448.5891 -- VALID RMSE LOSS: 1096329.1891 -- 2021091713:50:59
FOLD: 1 ----- BEST VALID RMSE LOSS: 1048750.125 ----- 20210917 13:51:11

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1601405.8548 -- VALID RMSE LOSS: 1398262.4295 -- 2021091713:51:12
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1079069.9444 -- VALID RMSE LOSS: 1096692.7404 -- 2021091713:51:24
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1043923.0722 -- VALID RMSE LOSS: 1068705.6587 -- 2021091713:51:37
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1012166.4407 -- VALID RMSE LOSS: 1060517.6026 -- 2021091713:51:50
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 980523.1980 -- VALID RMSE LOSS: 1066516.9375 -- 2021091713:52:03
FOLD: 2 -- EPOCH: 50 -- TRAIN RMSE LOSS: 960451.7077 -- VALID RMSE LOSS: 1052741.3782 -- 2021091713:52:15
FOLD: 2 -- EPOCH: 60 -- TRAIN RMSE LOSS: 948409.3464 -- VALID RMSE LOSS: 1069720.4423 -- 2021091713:52:28
FOLD: 2 ----- BEST VALID RMSE LOSS: 1046407.641025641 ----- 20210917 13:52:38

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1614925.3000 -- VALID RMSE LOSS: 1394337.6026 -- 2021091713:52:39
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074215.8347 -- VALID RMSE LOSS: 1093818.8846 -- 2021091713:52:52
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1042085.5581 -- VALID RMSE LOSS: 1073883.9006 -- 2021091713:53:05
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1014537.5276 -- VALID RMSE LOSS: 1079140.9984 -- 2021091713:53:18
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 985148.7226 -- VALID RMSE LOSS: 1074402.6859 -- 2021091713:53:30
FOLD: 3 ----- BEST VALID RMSE LOSS: 1059731.0833333333 ----- 20210917 13:53:34

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1584307.8222 -- VALID RMSE LOSS: 1408591.9519 -- 2021091713:53:35
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1076529.0766 -- VALID RMSE LOSS: 1094014.2853 -- 2021091713:53:48
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1044360.8935 -- VALID RMSE LOSS: 1093247.6875 -- 2021091713:54:01
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1016115.8956 -- VALID RMSE LOSS: 1090703.1138 -- 2021091713:54:14
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 988629.8310 -- VALID RMSE LOSS: 1067599.4006 -- 2021091713:54:29
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 964172.7073 -- VALID RMSE LOSS: 1083123.8766 -- 2021091713:54:42
FOLD: 4 ----- BEST VALID RMSE LOSS: 1067599.4006410257 ----- 20210917 13:54:43

FOLDS:
 FOLD 0: 1083924.02404
 FOLD 1: 1048750.12500
 FOLD 2: 1046407.64103
 FOLD 3: 1059731.08333
 FOLD 4: 1067599.40064

CROSS VALIDATION SCORE: 
 1061282.4548076924 ----- 20210917 13:54

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 275
    n_units_l1: 239
    n_units_l2: 128
    n_units_l3: 109
    n_units_l4: 100
    n_units_l5: 71
    n_units_l6: 21
    lr: 0.002525392383415848
**************

####################################################################################################

TRIAL 7 START TIME: 20210917 13:54:43

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0029310656667931676
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 297
    n_units_l1: 210
    n_units_l2: 126
    n_units_l3: 101
    n_units_l4: 70
    n_units_l5: 80
    n_units_l6: 25
    lr: 0.0029310656667931676


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1603506.6984 -- VALID RMSE LOSS: 1373651.9567 -- 2021091713:54:45
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1072819.0452 -- VALID RMSE LOSS: 1099695.7500 -- 2021091713:54:57
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1038500.6577 -- VALID RMSE LOSS: 1089673.6170 -- 2021091713:55:10
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1007460.5677 -- VALID RMSE LOSS: 1103724.2308 -- 2021091713:55:23
FOLD: 0 ----- BEST VALID RMSE LOSS: 1076976.7291666667 ----- 20210917 13:55:28

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1600164.7411 -- VALID RMSE LOSS: 1303721.8494 -- 2021091713:55:30
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1079503.9649 -- VALID RMSE LOSS: 1070383.6506 -- 2021091713:55:43
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1041459.0790 -- VALID RMSE LOSS: 1049198.1667 -- 2021091713:55:56
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1004183.6069 -- VALID RMSE LOSS: 1083031.6474 -- 2021091713:56:09
FOLD: 1 ----- BEST VALID RMSE LOSS: 1044424.3413461539 ----- 20210917 13:56:09

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1580466.5597 -- VALID RMSE LOSS: 1365288.9968 -- 2021091713:56:10
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1075517.1859 -- VALID RMSE LOSS: 1109929.6827 -- 2021091713:56:23
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1039278.8048 -- VALID RMSE LOSS: 1075831.9407 -- 2021091713:56:36
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1000358.8565 -- VALID RMSE LOSS: 1083470.6314 -- 2021091713:56:49
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 971430.4440 -- VALID RMSE LOSS: 1161838.4551 -- 2021091713:57:02
FOLD: 2 ----- BEST VALID RMSE LOSS: 1043993.2467948718 ----- 20210917 13:57:12

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1602330.3081 -- VALID RMSE LOSS: 1369117.2308 -- 2021091713:57:14
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1075780.0085 -- VALID RMSE LOSS: 1116577.6907 -- 2021091713:57:27
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1042085.8315 -- VALID RMSE LOSS: 1072807.5801 -- 2021091713:57:40
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1016604.6673 -- VALID RMSE LOSS: 1065664.8878 -- 2021091713:57:53
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 990779.8431 -- VALID RMSE LOSS: 1075262.8413 -- 2021091713:58:06
FOLD: 3 -- EPOCH: 50 -- TRAIN RMSE LOSS: 967490.8665 -- VALID RMSE LOSS: 1064438.1266 -- 2021091713:58:19
FOLD: 3 ----- BEST VALID RMSE LOSS: 1053373.4887820513 ----- 20210917 13:58:19

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1581397.8891 -- VALID RMSE LOSS: 1413862.2917 -- 2021091713:58:20
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074760.6149 -- VALID RMSE LOSS: 1109828.7548 -- 2021091713:58:34
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1038412.1020 -- VALID RMSE LOSS: 1088033.9231 -- 2021091713:58:47
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1008808.4371 -- VALID RMSE LOSS: 1091615.3045 -- 2021091713:59:00
FOLD: 4 ----- BEST VALID RMSE LOSS: 1066985.4567307692 ----- 20210917 13:59:07

FOLDS:
 FOLD 0: 1076976.72917
 FOLD 1: 1044424.34135
 FOLD 2: 1043993.24679
 FOLD 3: 1053373.48878
 FOLD 4: 1066985.45673

CROSS VALIDATION SCORE: 
 1057150.6525641023 ----- 20210917 13:59

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 297
    n_units_l1: 210
    n_units_l2: 126
    n_units_l3: 101
    n_units_l4: 70
    n_units_l5: 80
    n_units_l6: 25
    lr: 0.0029310656667931676
**************

####################################################################################################

TRIAL 8 START TIME: 20210917 13:59:07

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0020818706470803862
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 272
    n_units_l1: 238
    n_units_l2: 174
    n_units_l3: 108
    n_units_l4: 74
    n_units_l5: 79
    n_units_l6: 34
    lr: 0.0020818706470803862


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1613981.4431 -- VALID RMSE LOSS: 1381451.0208 -- 2021091713:59:08
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074402.0347 -- VALID RMSE LOSS: 1097635.7276 -- 2021091713:59:21
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1039952.6444 -- VALID RMSE LOSS: 1088035.1186 -- 2021091713:59:35
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1010438.8129 -- VALID RMSE LOSS: 1079166.7644 -- 2021091713:59:48
FOLD: 0 -- EPOCH: 40 -- TRAIN RMSE LOSS: 984274.2157 -- VALID RMSE LOSS: 1079547.6571 -- 2021091714:00:01
FOLD: 0 -- EPOCH: 50 -- TRAIN RMSE LOSS: 956334.8016 -- VALID RMSE LOSS: 1091443.6939 -- 2021091714:00:15
FOLD: 0 ----- BEST VALID RMSE LOSS: 1076593.0096153845 ----- 20210917 14:00:19

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1618394.1512 -- VALID RMSE LOSS: 1373491.1186 -- 2021091714:00:20
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1090544.1548 -- VALID RMSE LOSS: 1071557.5545 -- 2021091714:00:33
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1054858.4754 -- VALID RMSE LOSS: 1052950.5449 -- 2021091714:00:47
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1020632.4625 -- VALID RMSE LOSS: 1042099.0304 -- 2021091714:01:00
FOLD: 1 -- EPOCH: 40 -- TRAIN RMSE LOSS: 986788.6512 -- VALID RMSE LOSS: 1050798.3718 -- 2021091714:01:13
FOLD: 1 ----- BEST VALID RMSE LOSS: 1042099.030448718 ----- 20210917 14:01:15

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1616591.0032 -- VALID RMSE LOSS: 1423807.4359 -- 2021091714:01:16
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1088045.8423 -- VALID RMSE LOSS: 1127323.7244 -- 2021091714:01:29
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1045421.4782 -- VALID RMSE LOSS: 1056603.0593 -- 2021091714:01:43
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1009747.1077 -- VALID RMSE LOSS: 1098205.9343 -- 2021091714:01:56
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 981799.4901 -- VALID RMSE LOSS: 1089420.0609 -- 2021091714:02:10
FOLD: 2 ----- BEST VALID RMSE LOSS: 1051312.9503205128 ----- 20210917 14:02:15

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1599308.6399 -- VALID RMSE LOSS: 1399219.9679 -- 2021091714:02:17
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1087262.7306 -- VALID RMSE LOSS: 1119772.0256 -- 2021091714:02:30
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1048375.7960 -- VALID RMSE LOSS: 1104085.9022 -- 2021091714:02:44
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1017564.7222 -- VALID RMSE LOSS: 1067085.0913 -- 2021091714:02:57
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 989856.2895 -- VALID RMSE LOSS: 1060905.1346 -- 2021091714:03:11
FOLD: 3 -- EPOCH: 50 -- TRAIN RMSE LOSS: 965112.6421 -- VALID RMSE LOSS: 1062321.2131 -- 2021091714:03:24
FOLD: 3 -- EPOCH: 60 -- TRAIN RMSE LOSS: 953482.5968 -- VALID RMSE LOSS: 1065293.3077 -- 2021091714:03:38
FOLD: 3 ----- BEST VALID RMSE LOSS: 1049595.516025641 ----- 20210917 14:03:38

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1612590.5323 -- VALID RMSE LOSS: 1413595.0449 -- 2021091714:03:39
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1085465.5056 -- VALID RMSE LOSS: 1141896.9792 -- 2021091714:03:53
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1047208.1355 -- VALID RMSE LOSS: 1097009.9808 -- 2021091714:04:07
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1018174.9141 -- VALID RMSE LOSS: 1083519.3237 -- 2021091714:04:20
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 994775.8714 -- VALID RMSE LOSS: 1070387.9103 -- 2021091714:04:34
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 960065.3982 -- VALID RMSE LOSS: 1076078.7580 -- 2021091714:04:48
FOLD: 4 ----- BEST VALID RMSE LOSS: 1063145.1474358975 ----- 20210917 14:04:56

FOLDS:
 FOLD 0: 1076593.00962
 FOLD 1: 1042099.03045
 FOLD 2: 1051312.95032
 FOLD 3: 1049595.51603
 FOLD 4: 1063145.14744

CROSS VALIDATION SCORE: 
 1056549.1307692309 ----- 20210917 14:04

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 272
    n_units_l1: 238
    n_units_l2: 174
    n_units_l3: 108
    n_units_l4: 74
    n_units_l5: 79
    n_units_l6: 34
    lr: 0.0020818706470803862
**************

####################################################################################################

TRIAL 9 START TIME: 20210917 14:04:56

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0041196088163310786
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 288
    n_units_l1: 201
    n_units_l2: 135
    n_units_l3: 125
    n_units_l4: 88
    n_units_l5: 63
    n_units_l6: 34
    lr: 0.0041196088163310786


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1564496.9109 -- VALID RMSE LOSS: 1329788.9455 -- 2021091714:04:58
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1068267.8202 -- VALID RMSE LOSS: 1106616.4920 -- 2021091714:05:11
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1036281.5843 -- VALID RMSE LOSS: 1088794.9840 -- 2021091714:05:25
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1002081.2645 -- VALID RMSE LOSS: 1081200.1298 -- 2021091714:05:39
FOLD: 0 ----- BEST VALID RMSE LOSS: 1076085.860576923 ----- 20210917 14:05:48

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1518983.9694 -- VALID RMSE LOSS: 1188719.7340 -- 2021091714:05:50
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074664.3210 -- VALID RMSE LOSS: 1068704.2740 -- 2021091714:06:04
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1037374.1863 -- VALID RMSE LOSS: 1045121.6859 -- 2021091714:06:17
FOLD: 1 -- EPOCH: 30 -- TRAIN RMSE LOSS: 989083.8391 -- VALID RMSE LOSS: 1058336.1955 -- 2021091714:06:31
FOLD: 1 ----- BEST VALID RMSE LOSS: 1045121.6858974359 ----- 20210917 14:06:33

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1510505.3073 -- VALID RMSE LOSS: 1215468.2404 -- 2021091714:06:34
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1074498.1258 -- VALID RMSE LOSS: 1068608.2260 -- 2021091714:06:48
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1029376.4585 -- VALID RMSE LOSS: 1098422.0962 -- 2021091714:07:02
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 996525.8052 -- VALID RMSE LOSS: 1068577.3285 -- 2021091714:07:16
FOLD: 2 ----- BEST VALID RMSE LOSS: 1055907.5400641025 ----- 20210917 14:07:21

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1507352.3306 -- VALID RMSE LOSS: 1246529.2901 -- 2021091714:07:23
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1066054.7819 -- VALID RMSE LOSS: 1115729.1314 -- 2021091714:07:37
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1032640.0423 -- VALID RMSE LOSS: 1098010.9103 -- 2021091714:07:51
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 991999.5137 -- VALID RMSE LOSS: 1059628.6571 -- 2021091714:08:05
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 969133.4976 -- VALID RMSE LOSS: 1069837.6939 -- 2021091714:08:19
FOLD: 3 ----- BEST VALID RMSE LOSS: 1059628.657051282 ----- 20210917 14:08:20

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1527716.2464 -- VALID RMSE LOSS: 1323792.5577 -- 2021091714:08:22
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1062587.0411 -- VALID RMSE LOSS: 1108771.4135 -- 2021091714:08:36
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1033916.9427 -- VALID RMSE LOSS: 1069444.1971 -- 2021091714:08:50
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1004854.6681 -- VALID RMSE LOSS: 1089320.7756 -- 2021091714:09:04
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 972893.1351 -- VALID RMSE LOSS: 1073291.6603 -- 2021091714:09:18
FOLD: 4 -- EPOCH: 50 -- TRAIN RMSE LOSS: 957745.0306 -- VALID RMSE LOSS: 1064847.9856 -- 2021091714:09:32
FOLD: 4 ----- BEST VALID RMSE LOSS: 1055514.1458333333 ----- 20210917 14:09:39

FOLDS:
 FOLD 0: 1076085.86058
 FOLD 1: 1045121.68590
 FOLD 2: 1055907.54006
 FOLD 3: 1059628.65705
 FOLD 4: 1055514.14583

CROSS VALIDATION SCORE: 
 1058451.5778846154 ----- 20210917 14:09

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 288
    n_units_l1: 201
    n_units_l2: 135
    n_units_l3: 125
    n_units_l4: 88
    n_units_l5: 63
    n_units_l6: 34
    lr: 0.0041196088163310786
**************

####################################################################################################

TRIAL 10 START TIME: 20210917 14:09:39

OPTIMIZER:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.004968131398120278
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_units_l0: 295
    n_units_l1: 194
    n_units_l2: 141
    n_units_l3: 120
    n_units_l4: 98
    n_units_l5: 68
    n_units_l6: 25
    lr: 0.004968131398120278


FOLD: 0 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1536391.1278 -- VALID RMSE LOSS: 1271974.5096 -- 2021091714:09:41
FOLD: 0 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1066351.3653 -- VALID RMSE LOSS: 1104732.8446 -- 2021091714:09:55
FOLD: 0 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1037372.4677 -- VALID RMSE LOSS: 1084695.3429 -- 2021091714:10:09
FOLD: 0 -- EPOCH: 30 -- TRAIN RMSE LOSS: 1002755.4637 -- VALID RMSE LOSS: 1083698.2035 -- 2021091714:10:23
FOLD: 0 ----- BEST VALID RMSE LOSS: 1078936.1009615385 ----- 20210917 14:10:28

FOLD: 1 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1520007.9464 -- VALID RMSE LOSS: 1197606.0417 -- 2021091714:10:29
FOLD: 1 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1080722.8891 -- VALID RMSE LOSS: 1055773.3958 -- 2021091714:10:43
FOLD: 1 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1043040.2399 -- VALID RMSE LOSS: 1062949.4824 -- 2021091714:10:58
FOLD: 1 ----- BEST VALID RMSE LOSS: 1050451.6362179487 ----- 20210917 14:11:11

FOLD: 2 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1483334.1560 -- VALID RMSE LOSS: 1180249.5625 -- 2021091714:11:12
FOLD: 2 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1064725.7722 -- VALID RMSE LOSS: 1100858.1987 -- 2021091714:11:26
FOLD: 2 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1035511.9750 -- VALID RMSE LOSS: 1074878.6587 -- 2021091714:11:41
FOLD: 2 -- EPOCH: 30 -- TRAIN RMSE LOSS: 996054.8129 -- VALID RMSE LOSS: 1056916.5048 -- 2021091714:11:55
FOLD: 2 -- EPOCH: 40 -- TRAIN RMSE LOSS: 972618.5988 -- VALID RMSE LOSS: 1075991.0689 -- 2021091714:12:10
FOLD: 2 ----- BEST VALID RMSE LOSS: 1055899.8621794872 ----- 20210917 14:12:22

FOLD: 3 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1497288.2383 -- VALID RMSE LOSS: 1228106.7708 -- 2021091714:12:24
FOLD: 3 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1073073.3843 -- VALID RMSE LOSS: 1089354.8205 -- 2021091714:12:38
FOLD: 3 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1039080.6556 -- VALID RMSE LOSS: 1072643.4519 -- 2021091714:12:53
FOLD: 3 -- EPOCH: 30 -- TRAIN RMSE LOSS: 999343.9423 -- VALID RMSE LOSS: 1076884.3365 -- 2021091714:13:07
FOLD: 3 -- EPOCH: 40 -- TRAIN RMSE LOSS: 970383.8476 -- VALID RMSE LOSS: 1065136.0833 -- 2021091714:13:21
FOLD: 3 ----- BEST VALID RMSE LOSS: 1062534.532051282 ----- 20210917 14:13:27

FOLD: 4 -- EPOCH: 0 -- TRAIN RMSE LOSS: 1504564.8089 -- VALID RMSE LOSS: 1243866.0128 -- 2021091714:13:29
FOLD: 4 -- EPOCH: 10 -- TRAIN RMSE LOSS: 1069205.3798 -- VALID RMSE LOSS: 1097712.4231 -- 2021091714:13:43
FOLD: 4 -- EPOCH: 20 -- TRAIN RMSE LOSS: 1045626.1548 -- VALID RMSE LOSS: 1085087.0561 -- 2021091714:13:58
FOLD: 4 -- EPOCH: 30 -- TRAIN RMSE LOSS: 982405.6784 -- VALID RMSE LOSS: 1064055.1138 -- 2021091714:14:12
FOLD: 4 -- EPOCH: 40 -- TRAIN RMSE LOSS: 970609.5548 -- VALID RMSE LOSS: 1084249.3029 -- 2021091714:14:27
FOLD: 4 ----- BEST VALID RMSE LOSS: 1060688.0673076923 ----- 20210917 14:14:37

FOLDS:
 FOLD 0: 1078936.10096
 FOLD 1: 1050451.63622
 FOLD 2: 1055899.86218
 FOLD 3: 1062534.53205
 FOLD 4: 1060688.06731

CROSS VALIDATION SCORE: 
 1061702.0397435897 ----- 20210917 14:14

**************
Best CV trial:
  Value:  1052724.7512820512

  Params: 
    n_units_l0: 295
    n_units_l1: 194
    n_units_l2: 141
    n_units_l3: 120
    n_units_l4: 98
    n_units_l5: 68
    n_units_l6: 25
    lr: 0.004968131398120278
**************

####################################################################################################

Study statistics: 
  Number of finished trials:  10 

Best trial:
  Value:  1052724.7512820512
  Params: 
    n_units_l0: 291
    n_units_l1: 174
    n_units_l2: 143
    n_units_l3: 115
    n_units_l4: 85
    n_units_l5: 71
    n_units_l6: 30
    lr: 0.000600384947312816



TRIALS DATAFRAME: 
   number         value  ... params_n_units_l6     state
0       0  1.060947e+06  ...                29  COMPLETE
1       1  1.057410e+06  ...                39  COMPLETE
2       2  1.052725e+06  ...                30  COMPLETE
3       3  1.065416e+06  ...                30  COMPLETE
4       4  1.056264e+06  ...                32  COMPLETE
5       5  1.061282e+06  ...                21  COMPLETE
6       6  1.057151e+06  ...                25  COMPLETE
7       7  1.056549e+06  ...                34  COMPLETE
8       8  1.058452e+06  ...                34  COMPLETE
9       9  1.061702e+06  ...                25  COMPLETE

[10 rows x 14 columns]
In [ ]:
pd.read_csv(f'logs/nn_trials_{study_start_time}.csv').sort_values('value').iloc[0]
Out[ ]:
Unnamed: 0                                    2
number                                        2
value                               1.05272e+06
datetime_start       2021-09-17 13:30:31.139703
datetime_complete    2021-09-17 13:39:42.979530
duration                 0 days 00:09:11.839827
params_lr                           0.000600385
params_n_units_l0                           291
params_n_units_l1                           174
params_n_units_l2                           143
params_n_units_l3                           115
params_n_units_l4                            85
params_n_units_l5                            71
params_n_units_l6                            30
state                                  COMPLETE
Name: 2, dtype: object

Neural network architecture search study complete


Above we see one of our best NN trials on the full feature set, the RMSE CV is an improvement upon our grid search Random Forest in the last section.

However, this improvement is marginal (from ~1.07e6 to ~1.05e6) and clearly requires more involved modeling process and is less interpretable than Random Forest or our other baseline models. Also, we should check carefully the relationship between train and test loss, to make sure we are not over fitting. Setting aside a completely untouched holdout set to test our cross validated model would also be a good precaution to take.

Below we have graphics of our model optimization study. These plots help us understand what parameters are most important in our network topology optimzation study. As you can see, our the first trial turned out to have the best parameters of our study, although this would likely change if we increase the number of trials.

In [54]:
study = joblib.load(f'logs/nn_study_20210917 13:20.pkl')
optuna.visualization.plot_optimization_history(study)
In [55]:
optuna.visualization.plot_parallel_coordinate(study)
In [56]:
optuna.visualization.plot_param_importances(study)
In [57]:
optuna.visualization.plot_slice(study)

f. Denoising autoencoder architecture search for latent feature engineering


In this next section we use denoising autoencoders to learn a large set of generalized or denoised features that capture more sophisticated interactions.

Denoising autoencoders are more popular for computer vision tasks, yet we use the swap noise algorithm (defined in Engine class) to inject noise into our taular data. Swap noise randomly picks a proportion of cells in our dataframe and replaces their values with values from randomly sampled rows in the same column.

We will start by defining necessary classes and functions to optimisize DAE network topology (as we did we NN earlier), next we use the DAE on our entire feature set, and finally we train a new NN to see if we improve upon previous benchmarks.


In [ ]:
def define_dae(trial, param_ranges_dict):
    # We optimize the number of layers, hidden units, and noise ratio in each layer.
    layers = []
    n_layers = trial.suggest_int("n_layers", 
                                 param_ranges_dict[f'n_layers'][0],
                                 param_ranges_dict[f'n_layers'][1],
                                 param_ranges_dict[f'n_layers'][2])
    
    out_features_mult = trial.suggest_int("out_features_mult", 
                                          param_ranges_dict[f'out_features_mult'][0], 
                                          param_ranges_dict[f'out_features_mult'][1],  
                                          param_ranges_dict[f'out_features_mult'][2])
    
    in_features = len(feature_columns) - 1  
    batch_norm = trial.suggest_int('batch_norm',
                                   param_ranges_dict['batch_norm'][0],
                                   param_ranges_dict['batch_norm'][1])
    
    out_features = in_features * out_features_mult
    for i in range(n_layers):
        layers.append(nn.Linear(in_features, out_features))
        if batch_norm:
            layers.append(nn.BatchNorm1d(out_features))
            
        layers.append(nn.ReLU())
        in_features = out_features
    layers.append(nn.Linear(in_features, len(feature_columns) - 1))    

    return nn.Sequential(*layers)
    
In [ ]:
def dae_objective(trial, 
              param_ranges_dict,
              save_model, 
              plot_learning_curves = False
             ):  
    
    # create time stamp
    objective_start_time = f'{datetime.datetime.now().strftime("%Y%m%d %H:%M")}'
    print(f"TRIAL {len(study.trials)} START TIME: {objective_start_time}\n")
    
    # set up data and instantiate data loader
    x_train = train[feature_columns].drop('kfold', axis=1).to_numpy()

    train_dataset = Dataset(features=x_train, targets=x_train)
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=64, num_workers=4, shuffle=True
    )

    # instantiate the model.
    model = define_dae(trial, param_ranges_dict).to(DEVICE)
    print(model)

    # generate the learning rate and optimizer
    lr = trial.suggest_float("lr", param_ranges_dict['lr'][0], param_ranges_dict['lr'][1], log=True)        
    optimizer = optim.RMSprop(model.parameters(), lr=lr)
    
    # Print trial parameters        
    print(f"OPTIMIZER:\n{optimizer}\n")
    current_trial = study.trials[len(study.trials) - 1]
    print(f"CURRENT TRIAL PARAMETERS: ")    
    for key, value in current_trial.params.items():
        print("    {}: {}".format(key, value))
    print("\n")

    # Training of the model.
    eng = Engine(model, optimizer, device=DEVICE)
    best_loss = np.inf
    early_stopping_iter = 10
    early_stopping_counter = 0 

    # loop through epochs with early stopping to avoid overfitting
    for epoch in range(EPOCHS):
        train_loss = eng.dae_train(train_loader)                  

        if (epoch % 5 == 0) or (early_stopping_counter >= 15):
            print(
                f"EPOCH: {epoch:.0f} -- RECONSTUCTION BCE LOSS: {train_loss:.4f} -- "
                f"{datetime.datetime.now().strftime('%Y%m%d %H:%M:%S')}" 
            )

        # check if we have improvef on loss
        if train_loss < best_loss:
            best_loss = train_loss
            early_stopping_counter = 0
            if save_model:
                torch.save(model.state_dict(), f"models/dae_{trial.number}_{study_start_time}.bin")
        else:
            early_stopping_counter += 1

        if early_stopping_counter > early_stopping_iter:
            break

    # print final loss
    print(
        f"RECONSTUCTION BCE LOSS: {best_loss:.5f} ----- "
        f"{datetime.datetime.now().strftime('%Y%m%d %H:%M:%S')}\n"
    )
    
    # format and print output
    if len(study.trials) > 1:
        print("**************")
        print("Best trial:")
        best_trial_temp = study.best_trial

        print(f"  Value:  {best_trial_temp.value}\n")
        print("  Params: ")
        for key, value in trial.params.items():
            print("    {}: {}".format(key, value))
        print("**************\n")
    print(f"{100*'#'}\n")
        
    # log performance
    trials_df = study.trials_dataframe()
    trials_df.to_csv(f'logs/dae_trials_{study_start_time}.csv')
    
    return best_loss

Start optimization study of denoising autoencoder with respect to network topology space:


In [ ]:
optuna_params_dict =  {'lr': [.0001, 1e-3],     
                      'batch_norm': [0, 1],
                      'out_features_mult': [5, 10, 1],
                      'n_layers': [3, 5, 1],
                        'batch_size': [0]
                       }

if __name__ == "__main__":
    study_start_time = f'{datetime.datetime.now().strftime("%Y%m%d %H:%M")}'
    study = optuna.create_study(direction="minimize")
    study.optimize(lambda trial: 
                   dae_objective(trial,
                                 param_ranges_dict=optuna_params_dict,
                                 save_model = True, 
                                 plot_learning_curves = False), n_trials=number_of_trials)

    print("Study statistics: ")
    print("  Number of finished trials: ", len(study.trials), "\n")

    print("Best trial:")
    trial = study.best_trial

    print("  Value: ", trial.value)
    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
        
    trials_df = study.trials_dataframe()
    print(f"\n\n\nTRIALS DATAFRAME: ")
    print(trials_df)
    
    trials_df.to_csv(f'logs/dae_trials_{study_start_time}.csv')
    joblib.dump(study,f'logs/dae_study_{study_start_time}.pkl')
TRIAL 1 START TIME: 20210917 14:14

Sequential(
  (0): Linear(in_features=135, out_features=810, bias=True)
  (1): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
  (3): Linear(in_features=810, out_features=810, bias=True)
  (4): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Linear(in_features=810, out_features=810, bias=True)
  (7): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (8): ReLU()
  (9): Linear(in_features=810, out_features=135, bias=True)
)
OPTIMIZER:
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.00034695979004020213
    momentum: 0
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_layers: 3
    out_features_mult: 6
    batch_norm: 1
    lr: 0.00034695979004020213


EPOCH: 0 -- RECONSTUCTION BCE LOSS: 19.9781 -- 20210917 14:14:41
EPOCH: 5 -- RECONSTUCTION BCE LOSS: 2.2292 -- 20210917 14:14:54
EPOCH: 10 -- RECONSTUCTION BCE LOSS: 2.0319 -- 20210917 14:15:06
EPOCH: 15 -- RECONSTUCTION BCE LOSS: 1.9557 -- 20210917 14:15:19
EPOCH: 20 -- RECONSTUCTION BCE LOSS: 1.9067 -- 20210917 14:15:32
EPOCH: 25 -- RECONSTUCTION BCE LOSS: 1.8128 -- 20210917 14:15:44
EPOCH: 30 -- RECONSTUCTION BCE LOSS: 1.7875 -- 20210917 14:15:57
EPOCH: 35 -- RECONSTUCTION BCE LOSS: 1.7792 -- 20210917 14:16:09
EPOCH: 40 -- RECONSTUCTION BCE LOSS: 1.6801 -- 20210917 14:16:22
EPOCH: 45 -- RECONSTUCTION BCE LOSS: 1.7678 -- 20210917 14:16:34
EPOCH: 50 -- RECONSTUCTION BCE LOSS: 1.7383 -- 20210917 14:16:47
EPOCH: 55 -- RECONSTUCTION BCE LOSS: 1.6617 -- 20210917 14:16:59
EPOCH: 60 -- RECONSTUCTION BCE LOSS: 1.6539 -- 20210917 14:17:12
EPOCH: 65 -- RECONSTUCTION BCE LOSS: 1.6232 -- 20210917 14:17:24
RECONSTUCTION BCE LOSS: 1.57441 ----- 20210917 14:17:24

####################################################################################################

TRIAL 2 START TIME: 20210917 14:17

Sequential(
  (0): Linear(in_features=135, out_features=675, bias=True)
  (1): BatchNorm1d(675, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
  (3): Linear(in_features=675, out_features=675, bias=True)
  (4): BatchNorm1d(675, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Linear(in_features=675, out_features=675, bias=True)
  (7): BatchNorm1d(675, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (8): ReLU()
  (9): Linear(in_features=675, out_features=675, bias=True)
  (10): BatchNorm1d(675, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (11): ReLU()
  (12): Linear(in_features=675, out_features=135, bias=True)
)
OPTIMIZER:
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.00019501470220602878
    momentum: 0
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_layers: 4
    out_features_mult: 5
    batch_norm: 1
    lr: 0.00019501470220602878


EPOCH: 0 -- RECONSTUCTION BCE LOSS: 33.3268 -- 20210917 14:17:27
EPOCH: 5 -- RECONSTUCTION BCE LOSS: 2.1457 -- 20210917 14:17:40
EPOCH: 10 -- RECONSTUCTION BCE LOSS: 1.9996 -- 20210917 14:17:53
EPOCH: 15 -- RECONSTUCTION BCE LOSS: 1.9025 -- 20210917 14:18:06
EPOCH: 20 -- RECONSTUCTION BCE LOSS: 1.9205 -- 20210917 14:18:19
EPOCH: 25 -- RECONSTUCTION BCE LOSS: 1.7949 -- 20210917 14:18:32
EPOCH: 30 -- RECONSTUCTION BCE LOSS: 1.7634 -- 20210917 14:18:45
EPOCH: 35 -- RECONSTUCTION BCE LOSS: 1.7156 -- 20210917 14:18:57
EPOCH: 40 -- RECONSTUCTION BCE LOSS: 1.6920 -- 20210917 14:19:10
EPOCH: 45 -- RECONSTUCTION BCE LOSS: 1.7542 -- 20210917 14:19:23
EPOCH: 50 -- RECONSTUCTION BCE LOSS: 1.7599 -- 20210917 14:19:36
EPOCH: 55 -- RECONSTUCTION BCE LOSS: 1.6188 -- 20210917 14:19:49
EPOCH: 60 -- RECONSTUCTION BCE LOSS: 1.6209 -- 20210917 14:20:02
EPOCH: 65 -- RECONSTUCTION BCE LOSS: 1.6800 -- 20210917 14:20:15
RECONSTUCTION BCE LOSS: 1.58643 ----- 20210917 14:20:25

**************
Best trial:
  Value:  1.5744088218002121

  Params: 
    n_layers: 4
    out_features_mult: 5
    batch_norm: 1
    lr: 0.00019501470220602878
**************

####################################################################################################

TRIAL 3 START TIME: 20210917 14:20

Sequential(
  (0): Linear(in_features=135, out_features=945, bias=True)
  (1): BatchNorm1d(945, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
  (3): Linear(in_features=945, out_features=945, bias=True)
  (4): BatchNorm1d(945, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Linear(in_features=945, out_features=945, bias=True)
  (7): BatchNorm1d(945, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (8): ReLU()
  (9): Linear(in_features=945, out_features=135, bias=True)
)
OPTIMIZER:
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.0005196949182121734
    momentum: 0
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_layers: 3
    out_features_mult: 7
    batch_norm: 1
    lr: 0.0005196949182121734


EPOCH: 0 -- RECONSTUCTION BCE LOSS: 11.3242 -- 20210917 14:20:28
EPOCH: 5 -- RECONSTUCTION BCE LOSS: 2.1371 -- 20210917 14:20:41
EPOCH: 10 -- RECONSTUCTION BCE LOSS: 2.0183 -- 20210917 14:20:53
EPOCH: 15 -- RECONSTUCTION BCE LOSS: 1.9526 -- 20210917 14:21:06
EPOCH: 20 -- RECONSTUCTION BCE LOSS: 1.8877 -- 20210917 14:21:19
EPOCH: 25 -- RECONSTUCTION BCE LOSS: 1.8677 -- 20210917 14:21:31
EPOCH: 30 -- RECONSTUCTION BCE LOSS: 1.7455 -- 20210917 14:21:44
EPOCH: 35 -- RECONSTUCTION BCE LOSS: 1.7175 -- 20210917 14:21:57
EPOCH: 40 -- RECONSTUCTION BCE LOSS: 1.7934 -- 20210917 14:22:10
EPOCH: 45 -- RECONSTUCTION BCE LOSS: 1.8295 -- 20210917 14:22:22
EPOCH: 50 -- RECONSTUCTION BCE LOSS: 1.7655 -- 20210917 14:22:35
EPOCH: 55 -- RECONSTUCTION BCE LOSS: 1.7086 -- 20210917 14:22:48
RECONSTUCTION BCE LOSS: 1.64850 ----- 20210917 14:22:56

**************
Best trial:
  Value:  1.5744088218002121

  Params: 
    n_layers: 3
    out_features_mult: 7
    batch_norm: 1
    lr: 0.0005196949182121734
**************

####################################################################################################

TRIAL 4 START TIME: 20210917 14:22

Sequential(
  (0): Linear(in_features=135, out_features=1080, bias=True)
  (1): ReLU()
  (2): Linear(in_features=1080, out_features=1080, bias=True)
  (3): ReLU()
  (4): Linear(in_features=1080, out_features=1080, bias=True)
  (5): ReLU()
  (6): Linear(in_features=1080, out_features=135, bias=True)
)
OPTIMIZER:
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.0008206095734344681
    momentum: 0
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_layers: 3
    out_features_mult: 8
    batch_norm: 0
    lr: 0.0008206095734344681


EPOCH: 0 -- RECONSTUCTION BCE LOSS: 65.8399 -- 20210917 14:22:58
EPOCH: 5 -- RECONSTUCTION BCE LOSS: 2.8719 -- 20210917 14:23:11
EPOCH: 10 -- RECONSTUCTION BCE LOSS: 2.3915 -- 20210917 14:23:23
EPOCH: 15 -- RECONSTUCTION BCE LOSS: 2.1828 -- 20210917 14:23:36
EPOCH: 20 -- RECONSTUCTION BCE LOSS: 1.9446 -- 20210917 14:23:48
EPOCH: 25 -- RECONSTUCTION BCE LOSS: 1.9370 -- 20210917 14:24:02
EPOCH: 30 -- RECONSTUCTION BCE LOSS: 1.8919 -- 20210917 14:24:14
EPOCH: 35 -- RECONSTUCTION BCE LOSS: 1.8088 -- 20210917 14:24:27
RECONSTUCTION BCE LOSS: 1.78771 ----- 20210917 14:24:27

**************
Best trial:
  Value:  1.5744088218002121

  Params: 
    n_layers: 3
    out_features_mult: 8
    batch_norm: 0
    lr: 0.0008206095734344681
**************

####################################################################################################

TRIAL 5 START TIME: 20210917 14:24

Sequential(
  (0): Linear(in_features=135, out_features=675, bias=True)
  (1): ReLU()
  (2): Linear(in_features=675, out_features=675, bias=True)
  (3): ReLU()
  (4): Linear(in_features=675, out_features=675, bias=True)
  (5): ReLU()
  (6): Linear(in_features=675, out_features=675, bias=True)
  (7): ReLU()
  (8): Linear(in_features=675, out_features=675, bias=True)
  (9): ReLU()
  (10): Linear(in_features=675, out_features=135, bias=True)
)
OPTIMIZER:
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.00011190514824410668
    momentum: 0
    weight_decay: 0
)

CURRENT TRIAL PARAMETERS: 
    n_layers: 5
    out_features_mult: 5
    batch_norm: 0
    lr: 0.00011190514824410668


EPOCH: 0 -- RECONSTUCTION BCE LOSS: 3.7357 -- 20210917 14:24:29
EPOCH: 5 -- RECONSTUCTION BCE LOSS: 2.5251 -- 20210917 14:24:42
EPOCH: 10 -- RECONSTUCTION BCE LOSS: 2.4047 -- 20210917 14:24:54
EPOCH: 15 -- RECONSTUCTION BCE LOSS: 2.2775 -- 20210917 14:25:07
EPOCH: 20 -- RECONSTUCTION BCE LOSS: 2.4031 -- 20210917 14:25:19
EPOCH: 25 -- RECONSTUCTION BCE LOSS: 2.2647 -- 20210917 14:25:32
EPOCH: 30 -- RECONSTUCTION BCE LOSS: 2.1355 -- 20210917 14:25:44
EPOCH: 35 -- RECONSTUCTION BCE LOSS: 2.0194 -- 20210917 14:25:57
EPOCH: 40 -- RECONSTUCTION BCE LOSS: 1.9305 -- 20210917 14:26:09
EPOCH: 45 -- RECONSTUCTION BCE LOSS: 1.8515 -- 20210917 14:26:22
EPOCH: 50 -- RECONSTUCTION BCE LOSS: 1.8867 -- 20210917 14:26:34
EPOCH: 55 -- RECONSTUCTION BCE LOSS: 1.8951 -- 20210917 14:26:47
EPOCH: 60 -- RECONSTUCTION BCE LOSS: 1.7857 -- 20210917 14:26:59
EPOCH: 65 -- RECONSTUCTION BCE LOSS: 1.7887 -- 20210917 14:27:12
EPOCH: 70 -- RECONSTUCTION BCE LOSS: 1.8203 -- 20210917 14:27:25
EPOCH: 75 -- RECONSTUCTION BCE LOSS: 1.7447 -- 20210917 14:27:37
EPOCH: 80 -- RECONSTUCTION BCE LOSS: 1.7163 -- 20210917 14:27:50
EPOCH: 85 -- RECONSTUCTION BCE LOSS: 1.8065 -- 20210917 14:28:02
EPOCH: 90 -- RECONSTUCTION BCE LOSS: 1.7137 -- 20210917 14:28:14
RECONSTUCTION BCE LOSS: 1.67321 ----- 20210917 14:28:25

**************
Best trial:
  Value:  1.5744088218002121

  Params: 
    n_layers: 5
    out_features_mult: 5
    batch_norm: 0
    lr: 0.00011190514824410668
**************

####################################################################################################

Study statistics: 
  Number of finished trials:  5 

Best trial:
  Value:  1.5744088218002121
  Params: 
    n_layers: 3
    out_features_mult: 6
    batch_norm: 1
    lr: 0.00034695979004020213



TRIALS DATAFRAME: 
   number     value  ... params_out_features_mult     state
0       0  1.574409  ...                        6  COMPLETE
1       1  1.586433  ...                        5  COMPLETE
2       2  1.648500  ...                        7  COMPLETE
3       3  1.787713  ...                        8  COMPLETE
4       4  1.673208  ...                        5  COMPLETE

[5 rows x 10 columns]
In [ ]:
pd.read_csv(f'logs/dae_trials_{study_start_time}.csv').sort_values('value').iloc[0]
Out[ ]:
Unnamed: 0                                           0
number                                               0
value                                          1.57441
datetime_start              2021-09-17 14:14:39.271087
datetime_complete           2021-09-17 14:17:24.993501
duration                        0 days 00:02:45.722414
params_batch_norm                                    1
params_lr                                   0.00034696
params_n_layers                                      3
params_out_features_mult                             6
state                                         COMPLETE
Name: 0, dtype: object

Denoising autoencoder architecture search study complete


Above we see one of our best dae trials on the full feature set. This model produces a denoised, more general version of our features that is most similar to our original input. Now that we have the trained model, we can extract the activations from our entire sample as our new feature set in the next section.

Below we have graphics of our model optimization study. These plots help us understand what parameters are most important in our network topology optimzation study.

In [31]:
study = joblib.load(f'logs/dae_study_20210917 14:14.pkl')
optuna.visualization.plot_optimization_history(study)
In [32]:
optuna.visualization.plot_parallel_coordinate(study)
In [33]:
optuna.visualization.plot_param_importances(study)
In [34]:
optuna.visualization.plot_slice(study)

g. Denoising autoencoder latent feature extraction

In this section we pass our sample through our trained denoising autoencoder and grab the activations from a middle layer. We will then use these features as a new, more generalized feature set to predict our target.


In [ ]:
def DAE(dae_best_params):
    # We optimize the number of layers, hidden units, and noise ratio in each layer.
    layers = []

    n_layers = dae_best_params[f'n_layers']
    
    out_features_mult = dae_best_params[f'out_features_mult']
    
    in_features = len(feature_columns) - 1  

    batch_norm = dae_best_params['batch_norm']
                                  
    out_features = in_features * out_features_mult
    
    for i in range(n_layers):
        layers.append(nn.Linear(in_features, out_features))
        if batch_norm:
            layers.append(nn.BatchNorm1d(out_features))
            
        layers.append(nn.ReLU())
        in_features = out_features
    layers.append(nn.Linear(in_features, len(feature_columns) - 1))    

    return nn.Sequential(*layers)

Get best model parameters from optuna study


In [ ]:
dae_best_params_df = pd.read_csv(f'logs/dae_trials_{study_start_time}.csv').sort_values('value').iloc[0]
dae_best_params_df
Out[ ]:
Unnamed: 0                                           0
number                                               0
value                                          1.57441
datetime_start              2021-09-17 14:14:39.271087
datetime_complete           2021-09-17 14:17:24.993501
duration                        0 days 00:02:45.722414
params_batch_norm                                    1
params_lr                                   0.00034696
params_n_layers                                      3
params_out_features_mult                             6
state                                         COMPLETE
Name: 0, dtype: object
In [ ]:
dae_params = dae_best_params_df[dae_best_params_df.index.map(lambda x: x in ['params_n_layers','params_out_features_mult','params_batch_norm'])]
dae_best_params = {f"{i[7:]}":dae_params[i]  for i in dae_params.index}
dae_best_params
Out[ ]:
{'batch_norm': 1, 'n_layers': 3, 'out_features_mult': 6}

Instantiate denoising autocoder with weights from best trained dae


In [ ]:
model = DAE(dae_best_params)
model.load_state_dict(torch.load(f'models/dae_{dae_best_params_df["number"]}_{study_start_time}.bin'))
model.to(DEVICE)
model
Out[ ]:
Sequential(
  (0): Linear(in_features=135, out_features=810, bias=True)
  (1): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
  (3): Linear(in_features=810, out_features=810, bias=True)
  (4): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Linear(in_features=810, out_features=810, bias=True)
  (7): BatchNorm1d(810, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (8): ReLU()
  (9): Linear(in_features=810, out_features=135, bias=True)
)
In [ ]:
# set up data and instantiate data loader
x_train = train[feature_columns].to_numpy()
y_train = train[[target_columns]+['kfold']].to_numpy()

train_dataset = Dataset(features=x_train, targets=y_train)
train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=64, num_workers=4,
)
In [ ]:
# a dict to store the activations
activation = {}
def getActivation(name):
  # the hook signature
  def hook(model, input, output):
    activation[name] = output.detach()
  return hook
In [ ]:
# grabbing layers
dae_n_layers = dae_best_params_df['params_n_layers']
if dae_best_params['batch_norm']:
    activation_1 = str(3*dae_n_layers - 4)
    activation_2 = str(3*dae_n_layers - 1)
else:
    activation_1 = str(2*dae_n_layers - 3)
    activation_2 = str(2*dae_n_layers - 1)
print(f'grabbing layers: {activation_1}, {activation_2}')
grabbing layers: 5, 8
In [ ]:
# register forward hooks on the layers of choice
h1 = model._modules[activation_1].register_forward_hook(getActivation(activation_1))
h2 = model._modules[activation_2].register_forward_hook(getActivation(activation_2))
In [ ]:
relu_output_1, relu_output_2, dae_targets_b = [], [], []

# go through all the batches in the dataset
for i, batch in enumerate(train_loader):

  # forward pass -- getting the outputs
  inputs = batch["x"].to(DEVICE)
  outputs = model(inputs)
  targets = batch["y"][:, 0]
  # collect the activations in the correct list
  relu_output_1.append(activation[activation_1])
  relu_output_2.append(activation[activation_2])
  dae_targets_b.append(batch["y"])

# detach the hooks
h1.remove()
h2.remove()

# stack batches from two layers and average
dae_layer_outputs_1 = torch.cat(relu_output_1)
dae_layer_outputs_2 = torch.cat(relu_output_2)
dae_targets = torch.cat(dae_targets_b).detach().numpy()
dae_lat_feats_arr = torch.mean(torch.stack((dae_layer_outputs_1, dae_layer_outputs_2)), 0).cpu().detach().numpy()

# create new df with dae features and target
dae_feats_dict = {f'dae_{i+1}':dae_lat_feats_arr[:,i] for i in range(len(dae_lat_feats_arr[0,:]))}
dae_feats_dict['SALE PRICE'] = dae_targets[:,0].astype(int)
dae_feats_dict['kfold'] = dae_targets[:,1].astype(int)
dae_latent_features = pd.DataFrame(dae_feats_dict)
In [ ]:
# make sure our latent features are right size
assert (dae_latent_features['SALE PRICE'] == train['SALE PRICE']).sum()
assert (dae_latent_features['kfold'] == train['kfold']).sum()
assert len(dae_latent_features) == len(train)

h. Neural network prediction of denoised features

In this section we take the denoised feature set and predict our target using an optimized neural network.


In [ ]:
# dictionary for network parameters
optuna_params_dict = {'lr': [.001, .01],
                      
                      'n_units_l0': [1000, 1400],
                      'n_units_l1': [700, 1100],
                      'n_units_l2': [500, 800],
                      'n_units_l3': [300, 600],
                      'n_units_l4': [150, 300],
                      'n_units_l5': [80, 170],
                      'n_units_l6': [40, 90]}

train = dae_latent_features[:].copy()
feature_columns, target_columns = list(train.columns)[:-2] + ['kfold'], 'SALE PRICE' 

if __name__ == "__main__":  
    study_start_time = f'{datetime.datetime.now().strftime("%Y%m%d %H:%M")}'
    study = optuna.create_study(direction="minimize")
    study.optimize(lambda trial: objective(trial, optuna_params_dict, save_model=True), n_trials=number_of_trials)

    print("Study statistics: ")
    print("  Number of finished trials: ", len(study.trials), "\n")

    print("Best trial:")
    trial = study.best_trial

    print("  Value: ", trial.value)
    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
        
    trials_df = study.trials_dataframe()
    print(f"\n\n\nTRIALS DATAFRAME: ")
    print(trials_df)

    joblib.dump(study,f'logs/nn_denoised_study_{study_start_time}.pkl')
In [ ]:
pd.read_csv(f'logs/nn_trials_{study_start_time}.csv').sort_values('value').iloc[0]
Out[ ]:
Unnamed: 0                                    1
number                                        1
value                               1.16192e+06
datetime_start       2021-09-17 14:37:39.889460
datetime_complete    2021-09-17 14:50:49.858231
duration                 0 days 00:13:09.968771
params_lr                            0.00103172
params_n_units_l0                          1302
params_n_units_l1                           835
params_n_units_l2                           612
params_n_units_l3                           514
params_n_units_l4                           173
params_n_units_l5                           108
params_n_units_l6                            46
state                                  COMPLETE
Name: 1, dtype: object

Neural network trained on denoised features study complete


Above we see results from best neural network prediction trial on the denoised features. This model produces the best nn on the denoised latent feature set. The RMSE from this model didn't surpass our best model (neural network on full feature set), but could maybe be improved with more tinkering to the swap noise algorithm we used to add noise.

Below we have graphics of our model optimization study. These plots help us understand what parameters are most important in our network topology optimzation study.

In [40]:
study = joblib.load(f'logs/nn_denoised_study_20210917 14:28.pkl') 
fig = optuna.visualization.plot_optimization_history(study)
fig.show()
In [36]:
optuna.visualization.plot_parallel_coordinate(study)
In [37]:
optuna.visualization.plot_param_importances(study)
In [38]:
optuna.visualization.plot_slice(study)

5. Takeaways and next steps:


a. Data cleaning:

This data was fairly clean, but extracting more signal from it would have taken considerably more time, e.g. extracting street name features from address.

We could have spent a lot longer on the data cleaning process. We could have tried more approaches to missing data (discovering if there were patterns in the datas' missingness), encoding categorical data, and feature scaling.

With more time, I would have cleaned and encoded street names from the address variable, or investigated the square footage variables, they potentially could have added a lot of signal to our modeling. Also, would have been interesting to include all 5 boroughs.

Also, for predictive modeling purposes, we would want to be more careful regarding data leakage, which I am sure I am commiting here. Making sure that we split our data before imputation or scaling would help us ensure that our unseen data and models do not receive information they wouldn't have access to in a deployed prediction setting.

b. EDA:

Behind the scense, we investingated the data heavily to build intution about it. In this notebook, we mainly showed correlations and summary stats. These helped show the relationships between variables and their distributions.

We could have spent more time looking at the distributions of individual varibles as well as relationships between variables to detect redundancy for feature selection. Correlation was somewhat lackluster as most of our varibles were dummied nominal variables. Also, we could have plotted latent features using dimension reduction techniques.

c. Hypothesis testing:

We defined two separate null hypthoses (f-test and t-test) for our multiple linear regression models. We found p-values below our significance value (0.05) for both our f-test and our t-test, so we were able to reject both null nypotheses, which told us there is evidence of these relationships between our predictors and outcome.

We could have defined more informed null hypothses, as well as compared f-test statistics between models with different predictors sets to infer which predictors improved our models. Also, using hierachical linear models or mixed effect models could have helped us assess the effects of some of the higher level/cluster features (e.g. neighborhood, zip code, building type, etc.). While dummying out our features accomplished something similar, it would have been interesting to see output of random effects models and if it changed anything.

d. Predictive modeling:

We found a lot of evidence to support our hypothesis that model performance (assessed on RMSE) would improve with larger predictor sets and more complex ML models. Most of the improvement could be made by hyperparameter tuned ML models alone, Random Forest in particular.

Our supervised neural network with the original feature set further improve on our ML models, getting us down to RMSE below 1,040,000.

Latent features extracted from our autoencoder did not add to model performance, although maybe we could see improvement with more tinkering with how we added noise and picked our best autoencoder.

One important point is that we need to be sensitive to the complexity-interpretability trade off. Although I have seen autoencoders for feature engineering add substantially to model performance, this was clearly unfruitful here. In many cases, probably this once included, it often makes sense to avoid more complex techiniques in the interest of preserving model interpretability and time.

Also, with more time, should check carefully the relationship between train and test loss (especially with NN), to make sure we are not over fitting. Setting aside a completely untouched holdout set to test our cross validated model would also be a good precaution to take.

e. Time spent:

Overall, I probably spent ~5-10 hours on this assignment. Most of the time was spent up until the denoising autoencoder portion, although it is hard to say because some of the latter analysis was half-baked ideas or code I had already been working on. Unsupervised deep learning and autoencoders have really caught my interest recently so much of this anaylsis was already in the works. Admittedly, I got slighlty carried away but thought it would be interesting to test these techniques on this assignment. These techniques, maybe unsuprisingly, didn't add a large bump to performance, but have seen great results from them on others occasions. Nonetheless, all parts of this assignment were interesting, enjoyable, and thank you for reading!